{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a6840947",
   "metadata": {},
   "source": [
    "# Corpus analysis\n",
    "\n",
    "<div class=\"alert alert-success\">\n",
    "\n",
    "**Update: Changes to v > 0.3.0**\n",
    "\n",
    "Some major changes have been made with the newest version of the **docuscospacy** package. Most don't affect the syntax of the basic functions. However, the package runs all processing in [polars](https://docs.pola.rs/api/python/stable/reference/index.html) for vastly increased speed. After processing, you can easily convert a polars DataFrame [to pandas](https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.to_pandas.html), if that is your preference for filtering and sorting.\n",
    "\n",
    "The package is also now equipped with convenience functions like `corpus_from_folder` and `docuscope_parse` to make the processing pipeline easier for users and with fewer dependencies.\n",
    "\n",
    "Finally, though the syntax of the functions is largely unchanged from earlier versions, none of them require the passing of total counts anymore. All normalization takes place inside the functions for greater consistency.\n",
    "\n",
    "</div>\n",
    "\n",
    "The docuscospacy package supports the generation of:\n",
    "\n",
    "* Token frequency tables\n",
    "* Ngram tables\n",
    "* Collocation tables around a node word\n",
    "* Keyword comparisions against a reference corpus\n",
    "\n",
    "Most importantly, **outputs can be contolled either by part-of-speech or by DocuScope tag**. Thus, *can* as noun and *can* as verb, for example, can be disambiguated.\n",
    "\n",
    "Additionally, tagged multi-token sequencies are aggregatated for analysis. So, for example, where *in spite of* is tagged as a token sequence, it is combined into a signle token."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "964a4d1a",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note:About tmtoolkit**\n",
    "\n",
    "The package no longer requires [tmtoolit](https://tmtoolkit.readthedocs.io/en/latest/). However, there are functions to convert a tmtoolkit corpus to a docuscospacy DataFrame (`from_tmtoolkit`) and to convert a document-feature-matrix to a COOrdinate format matrix (`dtm_to_coo`), which can then be analyzed inside tmtoolkit.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "36969fb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "import spacy\n",
    "import docuscospacy as ds\n",
    "import polars as pl"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f56fa193",
   "metadata": {},
   "source": [
    "## Processing a corpus\n",
    "\n",
    "Before we generate any counts or tables, we need to load a corpus and tokenize it. Be sure you have downloaded the `en_docusco_spacy` model from [the huggingface model repository](https://huggingface.co/browndw/en_docusco_spacy)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "766dedd9",
   "metadata": {},
   "source": [
    "In order to download install the model into your environment use either:\n",
    "\n",
    "`pip install https://huggingface.co/browndw/en_docusco_spacy/resolve/main/en_docusco_spacy-any-py3-none-any.whl`\n",
    "\n",
    "Or for some newer spaCy versions:\n",
    "\n",
    "`pip install \"en_docusco_spacy @ https://huggingface.co/browndw/en_docusco_spacy/resolve/main/en_docusco_spacy-any-py3-none-any.whl\"`\n",
    "\n",
    "\n",
    "### Load an instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dbdbbdd4-cbec-403f-864e-8206234120bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "pip install \"en_docusco_spacy @ https://huggingface.co/browndw/en_docusco_spacy/resolve/main/en_docusco_spacy-any-py3-none-any.whl\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8ec328b",
   "metadata": {},
   "outputs": [],
   "source": [
    "nlp = spacy.load(\"en_docusco_spacy\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "227aa81d",
   "metadata": {},
   "source": [
    "### Load a corpus from a directory\n",
    "\n",
    "One easy way to prepare a corpus for processing is to simply simply use `corpus_from_folder` function, which reads in plain text (TXT) files from a directory and into a polars DataFrame with 'doc_id' and 'text' columns.\n",
    "\n",
    "The function **does not** recursively search through subdirectories. For greater control you can use the `get_text_paths` function, which has a recursive option and then `readtext` from the list returned list of file paths. This approach can also be useful if, for example, you have many files and want to test a pipeline with a subsample. In such a case, the list of paths can simply be down-sampled and the resulting subset read in using `readtext`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b93cf164",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_corpus = ds.corpus_from_folder(\"data/tar_corpus\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a801df36",
   "metadata": {},
   "source": [
    "Note the resulting data structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d7f180bf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 2)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>doc_id</th><th>text</th></tr><tr><td>str</td><td>str</td></tr></thead><tbody><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;In the field of plant biology,…</td></tr><tr><td>&quot;acad_02.txt&quot;</td><td>&quot;In my first paper for Complex …</td></tr><tr><td>&quot;acad_03.txt&quot;</td><td>&quot;At root, every hypothesis is a…</td></tr><tr><td>&quot;acad_04.txt&quot;</td><td>&quot;Several tests were administere…</td></tr><tr><td>&quot;acad_05.txt&quot;</td><td>&quot;The development of necking and…</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 2)\n",
       "┌─────────────┬─────────────────────────────────┐\n",
       "│ doc_id      ┆ text                            │\n",
       "│ ---         ┆ ---                             │\n",
       "│ str         ┆ str                             │\n",
       "╞═════════════╪═════════════════════════════════╡\n",
       "│ acad_01.txt ┆ In the field of plant biology,… │\n",
       "│ acad_02.txt ┆ In my first paper for Complex … │\n",
       "│ acad_03.txt ┆ At root, every hypothesis is a… │\n",
       "│ acad_04.txt ┆ Several tests were administere… │\n",
       "│ acad_05.txt ┆ The development of necking and… │\n",
       "└─────────────┴─────────────────────────────────┘"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds_corpus.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61f89ab8",
   "metadata": {},
   "source": [
    "This simple DataFrame structure is all that is explected to process the corpus. Thus, if you want to read in a CSV file, a parquet file, or similar tabular data, you can simply use one of [the input options from polars](https://docs.pola.rs/api/python/stable/reference/io.html).\n",
    "\n",
    "The only requirements are that the first column is called 'doc_id' and contains a unique idenfiier and that the second column is called 'text' and contains a string."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f4025fe",
   "metadata": {},
   "source": [
    "### Process corpus\n",
    "\n",
    "To process a corpus use the `docuscope_parse` function. The function requires a corpus DataFrame and the spaCy instance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "b9ab8b15",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds_tokens = ds.docuscope_parse(ds_corpus, nlp_model=nlp, n_process=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "9136fbdd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (20, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>doc_id</th><th>token</th><th>pos_tag</th><th>ds_tag</th><th>pos_id</th><th>ds_id</th></tr><tr><td>str</td><td>str</td><td>str</td><td>str</td><td>u32</td><td>u32</td></tr></thead><tbody><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;In &quot;</td><td>&quot;II&quot;</td><td>&quot;Untagged&quot;</td><td>1</td><td>1</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;the &quot;</td><td>&quot;AT&quot;</td><td>&quot;Untagged&quot;</td><td>2</td><td>2</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;field &quot;</td><td>&quot;NN1&quot;</td><td>&quot;Untagged&quot;</td><td>3</td><td>3</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;of &quot;</td><td>&quot;IO&quot;</td><td>&quot;Untagged&quot;</td><td>4</td><td>4</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;plant &quot;</td><td>&quot;NN1&quot;</td><td>&quot;InformationTopics&quot;</td><td>5</td><td>5</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;photosynthesis&quot;</td><td>&quot;NN1&quot;</td><td>&quot;AcademicTerms&quot;</td><td>16</td><td>13</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;. &quot;</td><td>&quot;Y&quot;</td><td>&quot;Untagged&quot;</td><td>17</td><td>14</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;This &quot;</td><td>&quot;DD1&quot;</td><td>&quot;MetadiscourseCohesive&quot;</td><td>18</td><td>15</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;process &quot;</td><td>&quot;NN1&quot;</td><td>&quot;InformationTopics&quot;</td><td>19</td><td>16</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;occurs &quot;</td><td>&quot;VVZ&quot;</td><td>&quot;Narrative&quot;</td><td>20</td><td>17</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (20, 6)\n",
       "┌─────────────┬────────────────┬─────────┬───────────────────────┬────────┬───────┐\n",
       "│ doc_id      ┆ token          ┆ pos_tag ┆ ds_tag                ┆ pos_id ┆ ds_id │\n",
       "│ ---         ┆ ---            ┆ ---     ┆ ---                   ┆ ---    ┆ ---   │\n",
       "│ str         ┆ str            ┆ str     ┆ str                   ┆ u32    ┆ u32   │\n",
       "╞═════════════╪════════════════╪═════════╪═══════════════════════╪════════╪═══════╡\n",
       "│ acad_01.txt ┆ In             ┆ II      ┆ Untagged              ┆ 1      ┆ 1     │\n",
       "│ acad_01.txt ┆ the            ┆ AT      ┆ Untagged              ┆ 2      ┆ 2     │\n",
       "│ acad_01.txt ┆ field          ┆ NN1     ┆ Untagged              ┆ 3      ┆ 3     │\n",
       "│ acad_01.txt ┆ of             ┆ IO      ┆ Untagged              ┆ 4      ┆ 4     │\n",
       "│ acad_01.txt ┆ plant          ┆ NN1     ┆ InformationTopics     ┆ 5      ┆ 5     │\n",
       "│ …           ┆ …              ┆ …       ┆ …                     ┆ …      ┆ …     │\n",
       "│ acad_01.txt ┆ photosynthesis ┆ NN1     ┆ AcademicTerms         ┆ 16     ┆ 13    │\n",
       "│ acad_01.txt ┆ .              ┆ Y       ┆ Untagged              ┆ 17     ┆ 14    │\n",
       "│ acad_01.txt ┆ This           ┆ DD1     ┆ MetadiscourseCohesive ┆ 18     ┆ 15    │\n",
       "│ acad_01.txt ┆ process        ┆ NN1     ┆ InformationTopics     ┆ 19     ┆ 16    │\n",
       "│ acad_01.txt ┆ occurs         ┆ VVZ     ┆ Narrative             ┆ 20     ┆ 17    │\n",
       "└─────────────┴────────────────┴─────────┴───────────────────────┴────────┴───────┘"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds_tokens.head(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3eed732e",
   "metadata": {},
   "source": [
    "## Frequency tables\n",
    "\n",
    "Frequency tables are produced by the `frequency_table` function, which takes a converted corpus object, a count against which to normalze and a `count_by` arguement that is one of **'pos'** or **'ds'** for part-of-speech or DocuScope category.\n",
    "\n",
    "In addition to being trained on DocuScope, the spaCy model was trained on the [CLAWS7 tagset](https://ucrel.lancs.ac.uk/claws7tags.html). Those tags are default counting method.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note: Normalizing**\n",
    "\n",
    "Earlier versions of the package required passing a tokens total the function. That is no longer required, as all normalizing is carried out inside the function.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "7b9e64f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "wc = ds.frequency_table(ds_tokens)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e307e001",
   "metadata": {},
   "source": [
    "The table returns a column of tokens, tags, absoulte frequency, relative frequency (per million tokens) and the range of text in which the token appears:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "5ff63902",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;the&quot;</td><td>&quot;AT&quot;</td><td>9610</td><td>72382.989621</td><td>100.0</td></tr><tr><td>&quot;of&quot;</td><td>&quot;IO&quot;</td><td>5065</td><td>38149.827516</td><td>100.0</td></tr><tr><td>&quot;and&quot;</td><td>&quot;CC&quot;</td><td>3672</td><td>27657.683443</td><td>100.0</td></tr><tr><td>&quot;in&quot;</td><td>&quot;II&quot;</td><td>2853</td><td>21488.93542</td><td>100.0</td></tr><tr><td>&quot;a&quot;</td><td>&quot;AT1&quot;</td><td>2569</td><td>19349.833542</td><td>100.0</td></tr><tr><td>&quot;to&quot;</td><td>&quot;TO&quot;</td><td>2171</td><td>16352.078092</td><td>100.0</td></tr><tr><td>&quot;is&quot;</td><td>&quot;VBZ&quot;</td><td>1784</td><td>13437.17518</td><td>98.0</td></tr><tr><td>&quot;that&quot;</td><td>&quot;CST&quot;</td><td>1550</td><td>11674.675745</td><td>100.0</td></tr><tr><td>&quot;to&quot;</td><td>&quot;II&quot;</td><td>1324</td><td>9972.432701</td><td>100.0</td></tr><tr><td>&quot;for&quot;</td><td>&quot;IF&quot;</td><td>1097</td><td>8262.657608</td><td>100.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 5)\n",
       "┌───────┬─────┬──────┬──────────────┬───────┐\n",
       "│ Token ┆ Tag ┆ AF   ┆ RF           ┆ Range │\n",
       "│ ---   ┆ --- ┆ ---  ┆ ---          ┆ ---   │\n",
       "│ str   ┆ str ┆ u32  ┆ f64          ┆ f64   │\n",
       "╞═══════╪═════╪══════╪══════════════╪═══════╡\n",
       "│ the   ┆ AT  ┆ 9610 ┆ 72382.989621 ┆ 100.0 │\n",
       "│ of    ┆ IO  ┆ 5065 ┆ 38149.827516 ┆ 100.0 │\n",
       "│ and   ┆ CC  ┆ 3672 ┆ 27657.683443 ┆ 100.0 │\n",
       "│ in    ┆ II  ┆ 2853 ┆ 21488.93542  ┆ 100.0 │\n",
       "│ a     ┆ AT1 ┆ 2569 ┆ 19349.833542 ┆ 100.0 │\n",
       "│ to    ┆ TO  ┆ 2171 ┆ 16352.078092 ┆ 100.0 │\n",
       "│ is    ┆ VBZ ┆ 1784 ┆ 13437.17518  ┆ 98.0  │\n",
       "│ that  ┆ CST ┆ 1550 ┆ 11674.675745 ┆ 100.0 │\n",
       "│ to    ┆ II  ┆ 1324 ┆ 9972.432701  ┆ 100.0 │\n",
       "│ for   ┆ IF  ┆ 1097 ┆ 8262.657608  ┆ 100.0 │\n",
       "└───────┴─────┴──────┴──────────────┴───────┘"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wc.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01193b72",
   "metadata": {},
   "source": [
    "The resulting data frame is easy to filter and sort. So, here, we filter for an absolute frequency greater than 10 and tokens tags as verbs (starting with 'V'):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a1ef1799",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (276, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;is&quot;</td><td>&quot;VBZ&quot;</td><td>1784</td><td>13437.17518</td><td>98.0</td></tr><tr><td>&quot;be&quot;</td><td>&quot;VBI&quot;</td><td>960</td><td>7230.766913</td><td>98.0</td></tr><tr><td>&quot;are&quot;</td><td>&quot;VBR&quot;</td><td>763</td><td>5746.953286</td><td>96.0</td></tr><tr><td>&quot;was&quot;</td><td>&quot;VBDZ&quot;</td><td>594</td><td>4474.037028</td><td>92.0</td></tr><tr><td>&quot;will&quot;</td><td>&quot;VM&quot;</td><td>512</td><td>3856.40902</td><td>82.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;take&quot;</td><td>&quot;VV0&quot;</td><td>11</td><td>82.852538</td><td>14.0</td></tr><tr><td>&quot;test&quot;</td><td>&quot;VVI&quot;</td><td>11</td><td>82.852538</td><td>12.0</td></tr><tr><td>&quot;want&quot;</td><td>&quot;VV0&quot;</td><td>11</td><td>82.852538</td><td>14.0</td></tr><tr><td>&quot;work&quot;</td><td>&quot;VV0&quot;</td><td>11</td><td>82.852538</td><td>12.0</td></tr><tr><td>&quot;written&quot;</td><td>&quot;VVN&quot;</td><td>11</td><td>82.852538</td><td>16.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (276, 5)\n",
       "┌─────────┬──────┬──────┬─────────────┬───────┐\n",
       "│ Token   ┆ Tag  ┆ AF   ┆ RF          ┆ Range │\n",
       "│ ---     ┆ ---  ┆ ---  ┆ ---         ┆ ---   │\n",
       "│ str     ┆ str  ┆ u32  ┆ f64         ┆ f64   │\n",
       "╞═════════╪══════╪══════╪═════════════╪═══════╡\n",
       "│ is      ┆ VBZ  ┆ 1784 ┆ 13437.17518 ┆ 98.0  │\n",
       "│ be      ┆ VBI  ┆ 960  ┆ 7230.766913 ┆ 98.0  │\n",
       "│ are     ┆ VBR  ┆ 763  ┆ 5746.953286 ┆ 96.0  │\n",
       "│ was     ┆ VBDZ ┆ 594  ┆ 4474.037028 ┆ 92.0  │\n",
       "│ will    ┆ VM   ┆ 512  ┆ 3856.40902  ┆ 82.0  │\n",
       "│ …       ┆ …    ┆ …    ┆ …           ┆ …     │\n",
       "│ take    ┆ VV0  ┆ 11   ┆ 82.852538   ┆ 14.0  │\n",
       "│ test    ┆ VVI  ┆ 11   ┆ 82.852538   ┆ 12.0  │\n",
       "│ want    ┆ VV0  ┆ 11   ┆ 82.852538   ┆ 14.0  │\n",
       "│ work    ┆ VV0  ┆ 11   ┆ 82.852538   ┆ 12.0  │\n",
       "│ written ┆ VVN  ┆ 11   ┆ 82.852538   ┆ 16.0  │\n",
       "└─────────┴──────┴──────┴─────────────┴───────┘"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wc.filter(\n",
    "    (pl.col(\"AF\") > 10) &\n",
    "    (pl.col(\"Tag\").str.starts_with(\"V\"))\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a20a89e4",
   "metadata": {},
   "source": [
    "Here, we sort for adverbs. Note that multi-word units tagged as a sequence are aggregated into a single token (like *for example*):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "352e53c9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (685, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;also&quot;</td><td>&quot;RR&quot;</td><td>302</td><td>2274.678758</td><td>98.0</td></tr><tr><td>&quot;more&quot;</td><td>&quot;RGR&quot;</td><td>255</td><td>1920.672461</td><td>82.0</td></tr><tr><td>&quot;et al&quot;</td><td>&quot;RA&quot;</td><td>201</td><td>1513.941822</td><td>12.0</td></tr><tr><td>&quot;however&quot;</td><td>&quot;RR&quot;</td><td>184</td><td>1385.896992</td><td>80.0</td></tr><tr><td>&quot;only&quot;</td><td>&quot;RR&quot;</td><td>159</td><td>1197.59577</td><td>84.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;wholeheartedly&quot;</td><td>&quot;RR&quot;</td><td>1</td><td>7.532049</td><td>2.0</td></tr><tr><td>&quot;wholly&quot;</td><td>&quot;RR&quot;</td><td>1</td><td>7.532049</td><td>2.0</td></tr><tr><td>&quot;wirelessly&quot;</td><td>&quot;RR&quot;</td><td>1</td><td>7.532049</td><td>2.0</td></tr><tr><td>&quot;wonderfully&quot;</td><td>&quot;RR&quot;</td><td>1</td><td>7.532049</td><td>2.0</td></tr><tr><td>&quot;worldwide&quot;</td><td>&quot;RL&quot;</td><td>1</td><td>7.532049</td><td>2.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (685, 5)\n",
       "┌────────────────┬─────┬─────┬─────────────┬───────┐\n",
       "│ Token          ┆ Tag ┆ AF  ┆ RF          ┆ Range │\n",
       "│ ---            ┆ --- ┆ --- ┆ ---         ┆ ---   │\n",
       "│ str            ┆ str ┆ u32 ┆ f64         ┆ f64   │\n",
       "╞════════════════╪═════╪═════╪═════════════╪═══════╡\n",
       "│ also           ┆ RR  ┆ 302 ┆ 2274.678758 ┆ 98.0  │\n",
       "│ more           ┆ RGR ┆ 255 ┆ 1920.672461 ┆ 82.0  │\n",
       "│ et al          ┆ RA  ┆ 201 ┆ 1513.941822 ┆ 12.0  │\n",
       "│ however        ┆ RR  ┆ 184 ┆ 1385.896992 ┆ 80.0  │\n",
       "│ only           ┆ RR  ┆ 159 ┆ 1197.59577  ┆ 84.0  │\n",
       "│ …              ┆ …   ┆ …   ┆ …           ┆ …     │\n",
       "│ wholeheartedly ┆ RR  ┆ 1   ┆ 7.532049    ┆ 2.0   │\n",
       "│ wholly         ┆ RR  ┆ 1   ┆ 7.532049    ┆ 2.0   │\n",
       "│ wirelessly     ┆ RR  ┆ 1   ┆ 7.532049    ┆ 2.0   │\n",
       "│ wonderfully    ┆ RR  ┆ 1   ┆ 7.532049    ┆ 2.0   │\n",
       "│ worldwide      ┆ RL  ┆ 1   ┆ 7.532049    ┆ 2.0   │\n",
       "└────────────────┴─────┴─────┴─────────────┴───────┘"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wc.filter(\n",
    "    pl.col(\"Tag\").str.starts_with(\"R\")\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2bcd1ac5",
   "metadata": {},
   "source": [
    "Similarly, we can generate a frequncy table of DocuScope tokens by setting `count_by='ds'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "0d3ac718",
   "metadata": {},
   "outputs": [],
   "source": [
    "wc = ds.frequency_table(ds_tokens, count_by='ds')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "811ef069",
   "metadata": {},
   "source": [
    "Most function words in isolation are not tagged by DocuScope (as they don't carry clear rhetorical meaning on their own)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "d12b9c79",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;the&quot;</td><td>&quot;Untagged&quot;</td><td>5686</td><td>52226.947488</td><td>100.0</td></tr><tr><td>&quot;and&quot;</td><td>&quot;Untagged&quot;</td><td>3506</td><td>32203.249718</td><td>100.0</td></tr><tr><td>&quot;of&quot;</td><td>&quot;Untagged&quot;</td><td>3148</td><td>28914.954396</td><td>100.0</td></tr><tr><td>&quot;in&quot;</td><td>&quot;Untagged&quot;</td><td>1935</td><td>17773.328067</td><td>100.0</td></tr><tr><td>&quot;to&quot;</td><td>&quot;Untagged&quot;</td><td>1705</td><td>15660.736101</td><td>100.0</td></tr><tr><td>&quot;a&quot;</td><td>&quot;Untagged&quot;</td><td>1452</td><td>13336.884937</td><td>100.0</td></tr><tr><td>&quot;that&quot;</td><td>&quot;Untagged&quot;</td><td>891</td><td>8183.997575</td><td>98.0</td></tr><tr><td>&quot;for&quot;</td><td>&quot;Untagged&quot;</td><td>749</td><td>6879.701665</td><td>98.0</td></tr><tr><td>&quot;as&quot;</td><td>&quot;Untagged&quot;</td><td>638</td><td>5860.146412</td><td>100.0</td></tr><tr><td>&quot;with&quot;</td><td>&quot;Untagged&quot;</td><td>610</td><td>5602.961303</td><td>100.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 5)\n",
       "┌───────┬──────────┬──────┬──────────────┬───────┐\n",
       "│ Token ┆ Tag      ┆ AF   ┆ RF           ┆ Range │\n",
       "│ ---   ┆ ---      ┆ ---  ┆ ---          ┆ ---   │\n",
       "│ str   ┆ str      ┆ u32  ┆ f64          ┆ f64   │\n",
       "╞═══════╪══════════╪══════╪══════════════╪═══════╡\n",
       "│ the   ┆ Untagged ┆ 5686 ┆ 52226.947488 ┆ 100.0 │\n",
       "│ and   ┆ Untagged ┆ 3506 ┆ 32203.249718 ┆ 100.0 │\n",
       "│ of    ┆ Untagged ┆ 3148 ┆ 28914.954396 ┆ 100.0 │\n",
       "│ in    ┆ Untagged ┆ 1935 ┆ 17773.328067 ┆ 100.0 │\n",
       "│ to    ┆ Untagged ┆ 1705 ┆ 15660.736101 ┆ 100.0 │\n",
       "│ a     ┆ Untagged ┆ 1452 ┆ 13336.884937 ┆ 100.0 │\n",
       "│ that  ┆ Untagged ┆ 891  ┆ 8183.997575  ┆ 98.0  │\n",
       "│ for   ┆ Untagged ┆ 749  ┆ 6879.701665  ┆ 98.0  │\n",
       "│ as    ┆ Untagged ┆ 638  ┆ 5860.146412  ┆ 100.0 │\n",
       "│ with  ┆ Untagged ┆ 610  ┆ 5602.961303  ┆ 100.0 │\n",
       "└───────┴──────────┴──────┴──────────────┴───────┘"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wc.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f4c854c",
   "metadata": {},
   "source": [
    "However, these same function works may appear in recognized phrases. This also means that the count of *the* is not inclusive of all occurences of the token."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "77ad350a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (20, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;the same&quot;</td><td>&quot;InformationExposition&quot;</td><td>35</td><td>321.481386</td><td>36.0</td></tr><tr><td>&quot;the most&quot;</td><td>&quot;ForceStressed&quot;</td><td>33</td><td>303.111021</td><td>38.0</td></tr><tr><td>&quot;the study&quot;</td><td>&quot;AcademicTerms&quot;</td><td>29</td><td>266.370291</td><td>4.0</td></tr><tr><td>&quot;the united states&quot;</td><td>&quot;InformationPlace&quot;</td><td>25</td><td>229.629562</td><td>22.0</td></tr><tr><td>&quot;the current&quot;</td><td>&quot;Narrative&quot;</td><td>22</td><td>202.074014</td><td>20.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;the community&quot;</td><td>&quot;PublicTerms&quot;</td><td>14</td><td>128.592554</td><td>8.0</td></tr><tr><td>&quot;the court&quot;</td><td>&quot;PublicTerms&quot;</td><td>14</td><td>128.592554</td><td>4.0</td></tr><tr><td>&quot;the second&quot;</td><td>&quot;InformationExposition&quot;</td><td>14</td><td>128.592554</td><td>18.0</td></tr><tr><td>&quot;the importance of&quot;</td><td>&quot;AcademicWritingMoves&quot;</td><td>13</td><td>119.407372</td><td>18.0</td></tr><tr><td>&quot;the people&quot;</td><td>&quot;Character&quot;</td><td>13</td><td>119.407372</td><td>12.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (20, 5)\n",
       "┌───────────────────┬───────────────────────┬─────┬────────────┬───────┐\n",
       "│ Token             ┆ Tag                   ┆ AF  ┆ RF         ┆ Range │\n",
       "│ ---               ┆ ---                   ┆ --- ┆ ---        ┆ ---   │\n",
       "│ str               ┆ str                   ┆ u32 ┆ f64        ┆ f64   │\n",
       "╞═══════════════════╪═══════════════════════╪═════╪════════════╪═══════╡\n",
       "│ the same          ┆ InformationExposition ┆ 35  ┆ 321.481386 ┆ 36.0  │\n",
       "│ the most          ┆ ForceStressed         ┆ 33  ┆ 303.111021 ┆ 38.0  │\n",
       "│ the study         ┆ AcademicTerms         ┆ 29  ┆ 266.370291 ┆ 4.0   │\n",
       "│ the united states ┆ InformationPlace      ┆ 25  ┆ 229.629562 ┆ 22.0  │\n",
       "│ the current       ┆ Narrative             ┆ 22  ┆ 202.074014 ┆ 20.0  │\n",
       "│ …                 ┆ …                     ┆ …   ┆ …          ┆ …     │\n",
       "│ the community     ┆ PublicTerms           ┆ 14  ┆ 128.592554 ┆ 8.0   │\n",
       "│ the court         ┆ PublicTerms           ┆ 14  ┆ 128.592554 ┆ 4.0   │\n",
       "│ the second        ┆ InformationExposition ┆ 14  ┆ 128.592554 ┆ 18.0  │\n",
       "│ the importance of ┆ AcademicWritingMoves  ┆ 13  ┆ 119.407372 ┆ 18.0  │\n",
       "│ the people        ┆ Character             ┆ 13  ┆ 119.407372 ┆ 12.0  │\n",
       "└───────────────────┴───────────────────────┴─────┴────────────┴───────┘"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wc.filter(\n",
    "    pl.col(\"Token\").str.starts_with(\"the \")\n",
    "    ).head(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d8a02f1",
   "metadata": {},
   "source": [
    "As with part-of-speech tags, we can easily filter the data frame for the desired [DocuScope category](https://docuscospacy.readthedocs.io/en/latest/docuscope.html#Categories). Here, we sort by 'Character':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "8b3d4a3a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (20, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;their&quot;</td><td>&quot;Character&quot;</td><td>335</td><td>3077.036125</td><td>88.0</td></tr><tr><td>&quot;his&quot;</td><td>&quot;Character&quot;</td><td>239</td><td>2195.258609</td><td>52.0</td></tr><tr><td>&quot;he&quot;</td><td>&quot;Character&quot;</td><td>135</td><td>1239.999633</td><td>48.0</td></tr><tr><td>&quot;students&quot;</td><td>&quot;Character&quot;</td><td>129</td><td>1184.888538</td><td>18.0</td></tr><tr><td>&quot;participants&quot;</td><td>&quot;Character&quot;</td><td>106</td><td>973.629341</td><td>14.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;religious&quot;</td><td>&quot;Character&quot;</td><td>54</td><td>495.999853</td><td>16.0</td></tr><tr><td>&quot;self&quot;</td><td>&quot;Character&quot;</td><td>54</td><td>495.999853</td><td>28.0</td></tr><tr><td>&quot;women&quot;</td><td>&quot;Character&quot;</td><td>51</td><td>468.444306</td><td>20.0</td></tr><tr><td>&quot;jews&quot;</td><td>&quot;Character&quot;</td><td>45</td><td>413.333211</td><td>6.0</td></tr><tr><td>&quot;adult&quot;</td><td>&quot;Character&quot;</td><td>44</td><td>404.148028</td><td>8.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (20, 5)\n",
       "┌──────────────┬───────────┬─────┬─────────────┬───────┐\n",
       "│ Token        ┆ Tag       ┆ AF  ┆ RF          ┆ Range │\n",
       "│ ---          ┆ ---       ┆ --- ┆ ---         ┆ ---   │\n",
       "│ str          ┆ str       ┆ u32 ┆ f64         ┆ f64   │\n",
       "╞══════════════╪═══════════╪═════╪═════════════╪═══════╡\n",
       "│ their        ┆ Character ┆ 335 ┆ 3077.036125 ┆ 88.0  │\n",
       "│ his          ┆ Character ┆ 239 ┆ 2195.258609 ┆ 52.0  │\n",
       "│ he           ┆ Character ┆ 135 ┆ 1239.999633 ┆ 48.0  │\n",
       "│ students     ┆ Character ┆ 129 ┆ 1184.888538 ┆ 18.0  │\n",
       "│ participants ┆ Character ┆ 106 ┆ 973.629341  ┆ 14.0  │\n",
       "│ …            ┆ …         ┆ …   ┆ …           ┆ …     │\n",
       "│ religious    ┆ Character ┆ 54  ┆ 495.999853  ┆ 16.0  │\n",
       "│ self         ┆ Character ┆ 54  ┆ 495.999853  ┆ 28.0  │\n",
       "│ women        ┆ Character ┆ 51  ┆ 468.444306  ┆ 20.0  │\n",
       "│ jews         ┆ Character ┆ 45  ┆ 413.333211  ┆ 6.0   │\n",
       "│ adult        ┆ Character ┆ 44  ┆ 404.148028  ┆ 8.0   │\n",
       "└──────────────┴───────────┴─────┴─────────────┴───────┘"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wc.filter(\n",
    "    pl.col(\"Tag\").str.starts_with(\"Character\")\n",
    "    ).head(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59b00e70",
   "metadata": {},
   "source": [
    "Or by 'Public Terms':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "fadee6e8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (20, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;national&quot;</td><td>&quot;PublicTerms&quot;</td><td>100</td><td>918.518246</td><td>32.0</td></tr><tr><td>&quot;political&quot;</td><td>&quot;PublicTerms&quot;</td><td>63</td><td>578.666495</td><td>24.0</td></tr><tr><td>&quot;society&quot;</td><td>&quot;PublicTerms&quot;</td><td>54</td><td>495.999853</td><td>28.0</td></tr><tr><td>&quot;citizenship&quot;</td><td>&quot;PublicTerms&quot;</td><td>53</td><td>486.814671</td><td>6.0</td></tr><tr><td>&quot;population&quot;</td><td>&quot;PublicTerms&quot;</td><td>45</td><td>413.333211</td><td>28.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;institutions&quot;</td><td>&quot;PublicTerms&quot;</td><td>21</td><td>192.888832</td><td>10.0</td></tr><tr><td>&quot;authority&quot;</td><td>&quot;PublicTerms&quot;</td><td>20</td><td>183.703649</td><td>18.0</td></tr><tr><td>&quot;amendment&quot;</td><td>&quot;PublicTerms&quot;</td><td>19</td><td>174.518467</td><td>6.0</td></tr><tr><td>&quot;majority of&quot;</td><td>&quot;PublicTerms&quot;</td><td>19</td><td>174.518467</td><td>24.0</td></tr><tr><td>&quot;association&quot;</td><td>&quot;PublicTerms&quot;</td><td>18</td><td>165.333284</td><td>20.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (20, 5)\n",
       "┌──────────────┬─────────────┬─────┬────────────┬───────┐\n",
       "│ Token        ┆ Tag         ┆ AF  ┆ RF         ┆ Range │\n",
       "│ ---          ┆ ---         ┆ --- ┆ ---        ┆ ---   │\n",
       "│ str          ┆ str         ┆ u32 ┆ f64        ┆ f64   │\n",
       "╞══════════════╪═════════════╪═════╪════════════╪═══════╡\n",
       "│ national     ┆ PublicTerms ┆ 100 ┆ 918.518246 ┆ 32.0  │\n",
       "│ political    ┆ PublicTerms ┆ 63  ┆ 578.666495 ┆ 24.0  │\n",
       "│ society      ┆ PublicTerms ┆ 54  ┆ 495.999853 ┆ 28.0  │\n",
       "│ citizenship  ┆ PublicTerms ┆ 53  ┆ 486.814671 ┆ 6.0   │\n",
       "│ population   ┆ PublicTerms ┆ 45  ┆ 413.333211 ┆ 28.0  │\n",
       "│ …            ┆ …           ┆ …   ┆ …          ┆ …     │\n",
       "│ institutions ┆ PublicTerms ┆ 21  ┆ 192.888832 ┆ 10.0  │\n",
       "│ authority    ┆ PublicTerms ┆ 20  ┆ 183.703649 ┆ 18.0  │\n",
       "│ amendment    ┆ PublicTerms ┆ 19  ┆ 174.518467 ┆ 6.0   │\n",
       "│ majority of  ┆ PublicTerms ┆ 19  ┆ 174.518467 ┆ 24.0  │\n",
       "│ association  ┆ PublicTerms ┆ 18  ┆ 165.333284 ┆ 20.0  │\n",
       "└──────────────┴─────────────┴─────┴────────────┴───────┘"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wc.filter(\n",
    "    pl.col(\"Tag\").str.starts_with(\"Public\")\n",
    "    ).head(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c2b0d5e",
   "metadata": {},
   "source": [
    "## Tags tables\n",
    "\n",
    "Rather than counting tokens, we can generate counts of the tags **only** by using the `tags_table` function. It works just like the `frequency_table` function, taking a dictionary created by the `convert_corpus` function, an integer agaist which to normalize, and a `count_by` argument of either 'pos' or 'ds'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "344bb7e2",
   "metadata": {},
   "outputs": [],
   "source": [
    "tc = ds.tags_table(ds_tokens)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "03279676",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 4)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Tag</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;NN1&quot;</td><td>24030</td><td>18.099513</td><td>100.0</td></tr><tr><td>&quot;JJ&quot;</td><td>11392</td><td>8.58051</td><td>100.0</td></tr><tr><td>&quot;AT&quot;</td><td>9725</td><td>7.324918</td><td>100.0</td></tr><tr><td>&quot;II&quot;</td><td>9492</td><td>7.149421</td><td>100.0</td></tr><tr><td>&quot;NN2&quot;</td><td>9146</td><td>6.888812</td><td>100.0</td></tr><tr><td>&quot;IO&quot;</td><td>5065</td><td>3.814983</td><td>100.0</td></tr><tr><td>&quot;NP1&quot;</td><td>4251</td><td>3.201874</td><td>98.0</td></tr><tr><td>&quot;CC&quot;</td><td>4184</td><td>3.151409</td><td>100.0</td></tr><tr><td>&quot;RR&quot;</td><td>4161</td><td>3.134086</td><td>100.0</td></tr><tr><td>&quot;VVI&quot;</td><td>3246</td><td>2.444903</td><td>100.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 4)\n",
       "┌─────┬───────┬───────────┬───────┐\n",
       "│ Tag ┆ AF    ┆ RF        ┆ Range │\n",
       "│ --- ┆ ---   ┆ ---       ┆ ---   │\n",
       "│ str ┆ u32   ┆ f64       ┆ f64   │\n",
       "╞═════╪═══════╪═══════════╪═══════╡\n",
       "│ NN1 ┆ 24030 ┆ 18.099513 ┆ 100.0 │\n",
       "│ JJ  ┆ 11392 ┆ 8.58051   ┆ 100.0 │\n",
       "│ AT  ┆ 9725  ┆ 7.324918  ┆ 100.0 │\n",
       "│ II  ┆ 9492  ┆ 7.149421  ┆ 100.0 │\n",
       "│ NN2 ┆ 9146  ┆ 6.888812  ┆ 100.0 │\n",
       "│ IO  ┆ 5065  ┆ 3.814983  ┆ 100.0 │\n",
       "│ NP1 ┆ 4251  ┆ 3.201874  ┆ 98.0  │\n",
       "│ CC  ┆ 4184  ┆ 3.151409  ┆ 100.0 │\n",
       "│ RR  ┆ 4161  ┆ 3.134086  ┆ 100.0 │\n",
       "│ VVI ┆ 3246  ┆ 2.444903  ┆ 100.0 │\n",
       "└─────┴───────┴───────────┴───────┘"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tc.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b301c5f2",
   "metadata": {},
   "source": [
    "And by DocuScope category:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "11a3d9c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "dc = ds.tags_table(ds_tokens, count_by=\"ds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "5264e8a0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 4)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Tag</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;Untagged&quot;</td><td>36990</td><td>33.98036</td><td>100.0</td></tr><tr><td>&quot;AcademicTerms&quot;</td><td>9245</td><td>8.492793</td><td>100.0</td></tr><tr><td>&quot;Character&quot;</td><td>7945</td><td>7.298566</td><td>100.0</td></tr><tr><td>&quot;Narrative&quot;</td><td>6840</td><td>6.283473</td><td>100.0</td></tr><tr><td>&quot;Description&quot;</td><td>6536</td><td>6.004207</td><td>100.0</td></tr><tr><td>&quot;InformationExposition&quot;</td><td>4982</td><td>4.576646</td><td>100.0</td></tr><tr><td>&quot;InformationTopics&quot;</td><td>3729</td><td>3.425595</td><td>98.0</td></tr><tr><td>&quot;Negative&quot;</td><td>3679</td><td>3.379663</td><td>100.0</td></tr><tr><td>&quot;Positive&quot;</td><td>3045</td><td>2.797248</td><td>100.0</td></tr><tr><td>&quot;MetadiscourseCohesive&quot;</td><td>2451</td><td>2.251578</td><td>100.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 4)\n",
       "┌───────────────────────┬───────┬──────────┬───────┐\n",
       "│ Tag                   ┆ AF    ┆ RF       ┆ Range │\n",
       "│ ---                   ┆ ---   ┆ ---      ┆ ---   │\n",
       "│ str                   ┆ u32   ┆ f64      ┆ f64   │\n",
       "╞═══════════════════════╪═══════╪══════════╪═══════╡\n",
       "│ Untagged              ┆ 36990 ┆ 33.98036 ┆ 100.0 │\n",
       "│ AcademicTerms         ┆ 9245  ┆ 8.492793 ┆ 100.0 │\n",
       "│ Character             ┆ 7945  ┆ 7.298566 ┆ 100.0 │\n",
       "│ Narrative             ┆ 6840  ┆ 6.283473 ┆ 100.0 │\n",
       "│ Description           ┆ 6536  ┆ 6.004207 ┆ 100.0 │\n",
       "│ InformationExposition ┆ 4982  ┆ 4.576646 ┆ 100.0 │\n",
       "│ InformationTopics     ┆ 3729  ┆ 3.425595 ┆ 98.0  │\n",
       "│ Negative              ┆ 3679  ┆ 3.379663 ┆ 100.0 │\n",
       "│ Positive              ┆ 3045  ┆ 2.797248 ┆ 100.0 │\n",
       "│ MetadiscourseCohesive ┆ 2451  ┆ 2.251578 ┆ 100.0 │\n",
       "└───────────────────────┴───────┴──────────┴───────┘"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dc.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c134f7e4",
   "metadata": {},
   "source": [
    "## Dispersions\n",
    "\n",
    "The `frequency_table` function includes 'Range' as a rudimentary measure for how tokens are distributed. For more advanced measures, you can use the `dispersions_table` function. This function includes common measures like Gries' [Deviation of Proportions](https://www.stgries.info/research/2010_STG_DispersionAdjFreq_CorpLingAppl.pdf)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "0bfd90c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "dsp = ds.dispersions_table(ds_tokens, count_by=\"pos\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "d6807bdc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 11)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>AF</th><th>RF</th><th>Carrolls_D2</th><th>Rosengrens_S</th><th>Lynes_D3</th><th>DC</th><th>Juillands_D</th><th>DP</th><th>DP_norm</th></tr><tr><td>str</td><td>str</td><td>u64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;the&quot;</td><td>&quot;AT&quot;</td><td>9610</td><td>72382.989621</td><td>0.964601</td><td>0.984981</td><td>0.930806</td><td>0.929015</td><td>0.967197</td><td>0.102275</td><td>0.102698</td></tr><tr><td>&quot;of&quot;</td><td>&quot;IO&quot;</td><td>5065</td><td>38149.827516</td><td>0.947715</td><td>0.984078</td><td>0.883843</td><td>0.90022</td><td>0.955746</td><td>0.095509</td><td>0.095904</td></tr><tr><td>&quot;and&quot;</td><td>&quot;CC&quot;</td><td>3672</td><td>27657.683443</td><td>0.928468</td><td>0.978108</td><td>0.821805</td><td>0.869744</td><td>0.957209</td><td>0.124252</td><td>0.124766</td></tr><tr><td>&quot;in&quot;</td><td>&quot;II&quot;</td><td>2959</td><td>22287.3326</td><td>0.930874</td><td>0.978738</td><td>0.844625</td><td>0.868134</td><td>0.953631</td><td>0.116709</td><td>0.117192</td></tr><tr><td>&quot;a&quot;</td><td>&quot;AT1&quot;</td><td>2572</td><td>19372.429688</td><td>0.945612</td><td>0.981248</td><td>0.886344</td><td>0.893346</td><td>0.960714</td><td>0.114134</td><td>0.114607</td></tr><tr><td>&quot;to&quot;</td><td>&quot;TO&quot;</td><td>2171</td><td>16352.078092</td><td>0.951199</td><td>0.972768</td><td>0.899994</td><td>0.903728</td><td>0.949974</td><td>0.131491</td><td>0.132035</td></tr><tr><td>&quot;is&quot;</td><td>&quot;VBZ&quot;</td><td>1784</td><td>13437.17518</td><td>0.919229</td><td>0.928686</td><td>0.831238</td><td>0.831865</td><td>0.922917</td><td>0.194194</td><td>0.194997</td></tr><tr><td>&quot;that&quot;</td><td>&quot;CST&quot;</td><td>1550</td><td>11674.675745</td><td>0.927448</td><td>0.956544</td><td>0.847784</td><td>0.855659</td><td>0.923811</td><td>0.156775</td><td>0.157424</td></tr><tr><td>&quot;to&quot;</td><td>&quot;II&quot;</td><td>1324</td><td>9972.432701</td><td>0.938721</td><td>0.987034</td><td>0.85423</td><td>0.885227</td><td>0.963669</td><td>0.097986</td><td>0.098392</td></tr><tr><td>&quot;for&quot;</td><td>&quot;IF&quot;</td><td>1099</td><td>8277.721706</td><td>0.941273</td><td>0.954536</td><td>0.875632</td><td>0.883362</td><td>0.933182</td><td>0.184637</td><td>0.185401</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 11)\n",
       "┌───────┬─────┬──────┬──────────────┬───┬──────────┬─────────────┬──────────┬──────────┐\n",
       "│ Token ┆ Tag ┆ AF   ┆ RF           ┆ … ┆ DC       ┆ Juillands_D ┆ DP       ┆ DP_norm  │\n",
       "│ ---   ┆ --- ┆ ---  ┆ ---          ┆   ┆ ---      ┆ ---         ┆ ---      ┆ ---      │\n",
       "│ str   ┆ str ┆ u64  ┆ f64          ┆   ┆ f64      ┆ f64         ┆ f64      ┆ f64      │\n",
       "╞═══════╪═════╪══════╪══════════════╪═══╪══════════╪═════════════╪══════════╪══════════╡\n",
       "│ the   ┆ AT  ┆ 9610 ┆ 72382.989621 ┆ … ┆ 0.929015 ┆ 0.967197    ┆ 0.102275 ┆ 0.102698 │\n",
       "│ of    ┆ IO  ┆ 5065 ┆ 38149.827516 ┆ … ┆ 0.90022  ┆ 0.955746    ┆ 0.095509 ┆ 0.095904 │\n",
       "│ and   ┆ CC  ┆ 3672 ┆ 27657.683443 ┆ … ┆ 0.869744 ┆ 0.957209    ┆ 0.124252 ┆ 0.124766 │\n",
       "│ in    ┆ II  ┆ 2959 ┆ 22287.3326   ┆ … ┆ 0.868134 ┆ 0.953631    ┆ 0.116709 ┆ 0.117192 │\n",
       "│ a     ┆ AT1 ┆ 2572 ┆ 19372.429688 ┆ … ┆ 0.893346 ┆ 0.960714    ┆ 0.114134 ┆ 0.114607 │\n",
       "│ to    ┆ TO  ┆ 2171 ┆ 16352.078092 ┆ … ┆ 0.903728 ┆ 0.949974    ┆ 0.131491 ┆ 0.132035 │\n",
       "│ is    ┆ VBZ ┆ 1784 ┆ 13437.17518  ┆ … ┆ 0.831865 ┆ 0.922917    ┆ 0.194194 ┆ 0.194997 │\n",
       "│ that  ┆ CST ┆ 1550 ┆ 11674.675745 ┆ … ┆ 0.855659 ┆ 0.923811    ┆ 0.156775 ┆ 0.157424 │\n",
       "│ to    ┆ II  ┆ 1324 ┆ 9972.432701  ┆ … ┆ 0.885227 ┆ 0.963669    ┆ 0.097986 ┆ 0.098392 │\n",
       "│ for   ┆ IF  ┆ 1099 ┆ 8277.721706  ┆ … ┆ 0.883362 ┆ 0.933182    ┆ 0.184637 ┆ 0.185401 │\n",
       "└───────┴─────┴──────┴──────────────┴───┴──────────┴─────────────┴──────────┴──────────┘"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dsp.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9dac851",
   "metadata": {},
   "source": [
    "## Ngrams and clusters\n",
    "\n",
    "Beacuse of the increased efficiency of polars, these functions have been updated and now include options for both ngrams and clusters, using a distinction that will be familiar to users of [AntConc](https://www.laurenceanthony.net/software/antconc/releases/AntConc324/help.pdf).\n",
    "\n",
    "### Ngrams\n",
    "\n",
    "Ngrams are simply to the most frequent tokens sequences from 2 to 5 in length. The `ngrams` function will filter for a minimum frequency. (The default is 10.)\n",
    "\n",
    "<div class=\"alert alert-warning\">\n",
    "    \n",
    "**Warning: Setting a low `min_frequency`**\n",
    "\n",
    "Be aware that depending on the size of your corpus, ngram tables can be massive. So be cautious when setting the threshold to or near zero.\n",
    "\n",
    "</div>\n",
    "\n",
    "The count that is returned is the raw count."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "168da0a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "nc = ds.ngrams(ds_tokens, span=3, min_frequency=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "f91090b2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 9)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token_1</th><th>Token_2</th><th>Token_3</th><th>Tag_1</th><th>Tag_2</th><th>Tag_3</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>str</td><td>str</td><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;part&quot;</td><td>&quot;time&quot;</td><td>&quot;faculty&quot;</td><td>&quot;NN1&quot;</td><td>&quot;NNT1&quot;</td><td>&quot;NN1&quot;</td><td>124</td><td>933.97406</td><td>2.0</td></tr><tr><td>&quot;of&quot;</td><td>&quot;part&quot;</td><td>&quot;time&quot;</td><td>&quot;IO&quot;</td><td>&quot;NN1&quot;</td><td>&quot;NNT1&quot;</td><td>53</td><td>399.19859</td><td>2.0</td></tr><tr><td>&quot;one&quot;</td><td>&quot;of&quot;</td><td>&quot;the&quot;</td><td>&quot;MC1&quot;</td><td>&quot;IO&quot;</td><td>&quot;AT&quot;</td><td>41</td><td>308.814004</td><td>48.0</td></tr><tr><td>&quot;the&quot;</td><td>&quot;pardoner&quot;</td><td>&quot;&#x27;s&quot;</td><td>&quot;AT&quot;</td><td>&quot;NP1&quot;</td><td>&quot;GE&quot;</td><td>40</td><td>301.281955</td><td>2.0</td></tr><tr><td>&quot;the&quot;</td><td>&quot;fact&quot;</td><td>&quot;that&quot;</td><td>&quot;AT&quot;</td><td>&quot;NN1&quot;</td><td>&quot;CST&quot;</td><td>34</td><td>256.089662</td><td>36.0</td></tr><tr><td>&quot;the&quot;</td><td>&quot;number&quot;</td><td>&quot;of&quot;</td><td>&quot;AT&quot;</td><td>&quot;NN1&quot;</td><td>&quot;IO&quot;</td><td>32</td><td>241.025564</td><td>18.0</td></tr><tr><td>&quot;there&quot;</td><td>&quot;is&quot;</td><td>&quot;a&quot;</td><td>&quot;EX&quot;</td><td>&quot;VBZ&quot;</td><td>&quot;AT1&quot;</td><td>31</td><td>233.493515</td><td>44.0</td></tr><tr><td>&quot;the&quot;</td><td>&quot;effects&quot;</td><td>&quot;of&quot;</td><td>&quot;AT&quot;</td><td>&quot;NN2&quot;</td><td>&quot;IO&quot;</td><td>30</td><td>225.961466</td><td>20.0</td></tr><tr><td>&quot;more&quot;</td><td>&quot;likely&quot;</td><td>&quot;to&quot;</td><td>&quot;RGR&quot;</td><td>&quot;JJ&quot;</td><td>&quot;TO&quot;</td><td>29</td><td>218.429417</td><td>16.0</td></tr><tr><td>&quot;at&quot;</td><td>&quot;community&quot;</td><td>&quot;colleges&quot;</td><td>&quot;II&quot;</td><td>&quot;NN1&quot;</td><td>&quot;NN2&quot;</td><td>28</td><td>210.897368</td><td>2.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 9)\n",
       "┌─────────┬───────────┬──────────┬───────┬───┬───────┬─────┬────────────┬───────┐\n",
       "│ Token_1 ┆ Token_2   ┆ Token_3  ┆ Tag_1 ┆ … ┆ Tag_3 ┆ AF  ┆ RF         ┆ Range │\n",
       "│ ---     ┆ ---       ┆ ---      ┆ ---   ┆   ┆ ---   ┆ --- ┆ ---        ┆ ---   │\n",
       "│ str     ┆ str       ┆ str      ┆ str   ┆   ┆ str   ┆ u32 ┆ f64        ┆ f64   │\n",
       "╞═════════╪═══════════╪══════════╪═══════╪═══╪═══════╪═════╪════════════╪═══════╡\n",
       "│ part    ┆ time      ┆ faculty  ┆ NN1   ┆ … ┆ NN1   ┆ 124 ┆ 933.97406  ┆ 2.0   │\n",
       "│ of      ┆ part      ┆ time     ┆ IO    ┆ … ┆ NNT1  ┆ 53  ┆ 399.19859  ┆ 2.0   │\n",
       "│ one     ┆ of        ┆ the      ┆ MC1   ┆ … ┆ AT    ┆ 41  ┆ 308.814004 ┆ 48.0  │\n",
       "│ the     ┆ pardoner  ┆ 's       ┆ AT    ┆ … ┆ GE    ┆ 40  ┆ 301.281955 ┆ 2.0   │\n",
       "│ the     ┆ fact      ┆ that     ┆ AT    ┆ … ┆ CST   ┆ 34  ┆ 256.089662 ┆ 36.0  │\n",
       "│ the     ┆ number    ┆ of       ┆ AT    ┆ … ┆ IO    ┆ 32  ┆ 241.025564 ┆ 18.0  │\n",
       "│ there   ┆ is        ┆ a        ┆ EX    ┆ … ┆ AT1   ┆ 31  ┆ 233.493515 ┆ 44.0  │\n",
       "│ the     ┆ effects   ┆ of       ┆ AT    ┆ … ┆ IO    ┆ 30  ┆ 225.961466 ┆ 20.0  │\n",
       "│ more    ┆ likely    ┆ to       ┆ RGR   ┆ … ┆ TO    ┆ 29  ┆ 218.429417 ┆ 16.0  │\n",
       "│ at      ┆ community ┆ colleges ┆ II    ┆ … ┆ NN2   ┆ 28  ┆ 210.897368 ┆ 2.0   │\n",
       "└─────────┴───────────┴──────────┴───────┴───┴───────┴─────┴────────────┴───────┘"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nc.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc9ae1b8",
   "metadata": {},
   "source": [
    "### Clusters\n",
    "\n",
    "Clusters can be calculated using the `clusters_by_token` function. Clusters can be created using different options:\n",
    "* You can input a word or string using the `clusters_by_token` function. With that function you need to specify whether that input should match a token completely or partially, and choose which tagset to return.\n",
    "* Alternatively, you can use the `clusters_by_tag` function. That allows you to select a tag (like **NN1** or **AcademicTerms**) as the basis for your clusters.\n",
    "* For either option, you must select the size of your clusters (2-grams, 3-grams, or 4-grams) and the slot where your chosen word or tag should appear (on the left, in the middle, or on the right).\n",
    "\n",
    "We'll start by searching for clusters of length **3** with **data** in the first position. The returned data frame includes both the sequence of tokens, as well as the sequence of tags:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "91f1d33d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 9)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token_1</th><th>Token_2</th><th>Token_3</th><th>Tag_1</th><th>Tag_2</th><th>Tag_3</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>str</td><td>str</td><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;data&quot;</td><td>&quot;from&quot;</td><td>&quot;the&quot;</td><td>&quot;NN&quot;</td><td>&quot;II&quot;</td><td>&quot;AT&quot;</td><td>6</td><td>45.192293</td><td>19.047619</td></tr><tr><td>&quot;data&quot;</td><td>&quot;was&quot;</td><td>&quot;recorded&quot;</td><td>&quot;NN&quot;</td><td>&quot;VBDZ&quot;</td><td>&quot;VVN&quot;</td><td>3</td><td>22.596147</td><td>4.761905</td></tr><tr><td>&quot;data&quot;</td><td>&quot;collection&quot;</td><td>&quot;process&quot;</td><td>&quot;NN&quot;</td><td>&quot;NN1&quot;</td><td>&quot;NN1&quot;</td><td>3</td><td>22.596147</td><td>4.761905</td></tr><tr><td>&quot;data&quot;</td><td>&quot;is&quot;</td><td>&quot;by&quot;</td><td>&quot;NN&quot;</td><td>&quot;VBZ&quot;</td><td>&quot;II&quot;</td><td>2</td><td>15.064098</td><td>4.761905</td></tr><tr><td>&quot;data&quot;</td><td>&quot;collection&quot;</td><td>&quot;will&quot;</td><td>&quot;NN&quot;</td><td>&quot;NN1&quot;</td><td>&quot;VM&quot;</td><td>2</td><td>15.064098</td><td>4.761905</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 9)\n",
       "┌─────────┬────────────┬──────────┬───────┬───┬───────┬─────┬───────────┬───────────┐\n",
       "│ Token_1 ┆ Token_2    ┆ Token_3  ┆ Tag_1 ┆ … ┆ Tag_3 ┆ AF  ┆ RF        ┆ Range     │\n",
       "│ ---     ┆ ---        ┆ ---      ┆ ---   ┆   ┆ ---   ┆ --- ┆ ---       ┆ ---       │\n",
       "│ str     ┆ str        ┆ str      ┆ str   ┆   ┆ str   ┆ u32 ┆ f64       ┆ f64       │\n",
       "╞═════════╪════════════╪══════════╪═══════╪═══╪═══════╪═════╪═══════════╪═══════════╡\n",
       "│ data    ┆ from       ┆ the      ┆ NN    ┆ … ┆ AT    ┆ 6   ┆ 45.192293 ┆ 19.047619 │\n",
       "│ data    ┆ was        ┆ recorded ┆ NN    ┆ … ┆ VVN   ┆ 3   ┆ 22.596147 ┆ 4.761905  │\n",
       "│ data    ┆ collection ┆ process  ┆ NN    ┆ … ┆ NN1   ┆ 3   ┆ 22.596147 ┆ 4.761905  │\n",
       "│ data    ┆ is         ┆ by       ┆ NN    ┆ … ┆ II    ┆ 2   ┆ 15.064098 ┆ 4.761905  │\n",
       "│ data    ┆ collection ┆ will     ┆ NN    ┆ … ┆ VM    ┆ 2   ┆ 15.064098 ┆ 4.761905  │\n",
       "└─────────┴────────────┴──────────┴───────┴───┴───────┴─────┴───────────┴───────────┘"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds.clusters_by_token(ds_tokens, node_word='data', node_position=1, span=3).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "376f2059-5696-4116-b04c-647004bcad6b",
   "metadata": {},
   "source": [
    "We can similarly look for clusters that include only part of word. For example, we can find bigrams that include word ending with **-tion** by setting the `search_type` to **ends_with**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "612c1654-e0c9-459d-898e-6da07522ef07",
   "metadata": {},
   "outputs": [],
   "source": [
    "nc = ds.clusters_by_token(ds_tokens, node_word='tion', node_position=2, span=2, search_type='ends_with', count_by='pos')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "f8930648-64a0-47a0-976f-25242c1dd5c5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 7)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token_1</th><th>Token_2</th><th>Tag_1</th><th>Tag_2</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;the&quot;</td><td>&quot;intervention&quot;</td><td>&quot;AT&quot;</td><td>&quot;NN1&quot;</td><td>34</td><td>256.089662</td><td>2.0</td></tr><tr><td>&quot;citizenship&quot;</td><td>&quot;education&quot;</td><td>&quot;NN1&quot;</td><td>&quot;NN1&quot;</td><td>30</td><td>225.961466</td><td>2.0</td></tr><tr><td>&quot;the&quot;</td><td>&quot;nation&quot;</td><td>&quot;AT&quot;</td><td>&quot;NN1&quot;</td><td>27</td><td>203.365319</td><td>12.0</td></tr><tr><td>&quot;data&quot;</td><td>&quot;collection&quot;</td><td>&quot;NN&quot;</td><td>&quot;NN1&quot;</td><td>17</td><td>128.044831</td><td>8.0</td></tr><tr><td>&quot;higher&quot;</td><td>&quot;education&quot;</td><td>&quot;JJR&quot;</td><td>&quot;NN1&quot;</td><td>16</td><td>120.512782</td><td>4.0</td></tr><tr><td>&quot;of&quot;</td><td>&quot;education&quot;</td><td>&quot;IO&quot;</td><td>&quot;NN1&quot;</td><td>16</td><td>120.512782</td><td>8.0</td></tr><tr><td>&quot;the&quot;</td><td>&quot;formation&quot;</td><td>&quot;AT&quot;</td><td>&quot;NN1&quot;</td><td>15</td><td>112.980733</td><td>8.0</td></tr><tr><td>&quot;the&quot;</td><td>&quot;notion&quot;</td><td>&quot;AT&quot;</td><td>&quot;NN1&quot;</td><td>15</td><td>112.980733</td><td>16.0</td></tr><tr><td>&quot;brow&quot;</td><td>&quot;manipulation&quot;</td><td>&quot;NN1&quot;</td><td>&quot;NN1&quot;</td><td>14</td><td>105.448684</td><td>2.0</td></tr><tr><td>&quot;the&quot;</td><td>&quot;manipulation&quot;</td><td>&quot;AT&quot;</td><td>&quot;NN1&quot;</td><td>13</td><td>97.916635</td><td>2.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 7)\n",
       "┌─────────────┬──────────────┬───────┬───────┬─────┬────────────┬───────┐\n",
       "│ Token_1     ┆ Token_2      ┆ Tag_1 ┆ Tag_2 ┆ AF  ┆ RF         ┆ Range │\n",
       "│ ---         ┆ ---          ┆ ---   ┆ ---   ┆ --- ┆ ---        ┆ ---   │\n",
       "│ str         ┆ str          ┆ str   ┆ str   ┆ u32 ┆ f64        ┆ f64   │\n",
       "╞═════════════╪══════════════╪═══════╪═══════╪═════╪════════════╪═══════╡\n",
       "│ the         ┆ intervention ┆ AT    ┆ NN1   ┆ 34  ┆ 256.089662 ┆ 2.0   │\n",
       "│ citizenship ┆ education    ┆ NN1   ┆ NN1   ┆ 30  ┆ 225.961466 ┆ 2.0   │\n",
       "│ the         ┆ nation       ┆ AT    ┆ NN1   ┆ 27  ┆ 203.365319 ┆ 12.0  │\n",
       "│ data        ┆ collection   ┆ NN    ┆ NN1   ┆ 17  ┆ 128.044831 ┆ 8.0   │\n",
       "│ higher      ┆ education    ┆ JJR   ┆ NN1   ┆ 16  ┆ 120.512782 ┆ 4.0   │\n",
       "│ of          ┆ education    ┆ IO    ┆ NN1   ┆ 16  ┆ 120.512782 ┆ 8.0   │\n",
       "│ the         ┆ formation    ┆ AT    ┆ NN1   ┆ 15  ┆ 112.980733 ┆ 8.0   │\n",
       "│ the         ┆ notion       ┆ AT    ┆ NN1   ┆ 15  ┆ 112.980733 ┆ 16.0  │\n",
       "│ brow        ┆ manipulation ┆ NN1   ┆ NN1   ┆ 14  ┆ 105.448684 ┆ 2.0   │\n",
       "│ the         ┆ manipulation ┆ AT    ┆ NN1   ┆ 13  ┆ 97.916635  ┆ 2.0   │\n",
       "└─────────────┴──────────────┴───────┴───────┴─────┴────────────┴───────┘"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nc.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f29d7996",
   "metadata": {},
   "source": [
    "Now we'll collect n-grams using the `clusters_by_tag` function. Here, we'll look at 3-token sequences that end with a past participle (**VVN**)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "a3b82ccd",
   "metadata": {},
   "outputs": [],
   "source": [
    "nc = ds.clusters_by_tag(ds_tokens, tag='VVN', tag_position=3, span=3, count_by='pos')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "ad4feaf9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 9)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token_1</th><th>Token_2</th><th>Token_3</th><th>Tag_1</th><th>Tag_2</th><th>Tag_3</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>str</td><td>str</td><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;can&quot;</td><td>&quot;be&quot;</td><td>&quot;seen&quot;</td><td>&quot;VM&quot;</td><td>&quot;VBI&quot;</td><td>&quot;VVN&quot;</td><td>17</td><td>128.044831</td><td>16.0</td></tr><tr><td>&quot;to&quot;</td><td>&quot;be&quot;</td><td>&quot;used&quot;</td><td>&quot;TO&quot;</td><td>&quot;VBI&quot;</td><td>&quot;VVN&quot;</td><td>10</td><td>75.320489</td><td>14.0</td></tr><tr><td>&quot;can&quot;</td><td>&quot;be&quot;</td><td>&quot;used&quot;</td><td>&quot;VM&quot;</td><td>&quot;VBI&quot;</td><td>&quot;VVN&quot;</td><td>10</td><td>75.320489</td><td>14.0</td></tr><tr><td>&quot;will&quot;</td><td>&quot;be&quot;</td><td>&quot;asked&quot;</td><td>&quot;VM&quot;</td><td>&quot;VBI&quot;</td><td>&quot;VVN&quot;</td><td>7</td><td>52.724342</td><td>8.0</td></tr><tr><td>&quot;should&quot;</td><td>&quot;be&quot;</td><td>&quot;noted&quot;</td><td>&quot;VM&quot;</td><td>&quot;VBI&quot;</td><td>&quot;VVN&quot;</td><td>7</td><td>52.724342</td><td>8.0</td></tr><tr><td>&quot;could&quot;</td><td>&quot;be&quot;</td><td>&quot;used&quot;</td><td>&quot;VM&quot;</td><td>&quot;VBI&quot;</td><td>&quot;VVN&quot;</td><td>7</td><td>52.724342</td><td>10.0</td></tr><tr><td>&quot;has&quot;</td><td>&quot;been&quot;</td><td>&quot;shown&quot;</td><td>&quot;VHZ&quot;</td><td>&quot;VBN&quot;</td><td>&quot;VVN&quot;</td><td>6</td><td>45.192293</td><td>8.0</td></tr><tr><td>&quot;will&quot;</td><td>&quot;be&quot;</td><td>&quot;used&quot;</td><td>&quot;VM&quot;</td><td>&quot;VBI&quot;</td><td>&quot;VVN&quot;</td><td>5</td><td>37.660244</td><td>4.0</td></tr><tr><td>&quot;can&quot;</td><td>&quot;be&quot;</td><td>&quot;observed&quot;</td><td>&quot;VM&quot;</td><td>&quot;VBI&quot;</td><td>&quot;VVN&quot;</td><td>5</td><td>37.660244</td><td>4.0</td></tr><tr><td>&quot;can&quot;</td><td>&quot;be&quot;</td><td>&quot;found&quot;</td><td>&quot;VM&quot;</td><td>&quot;VBI&quot;</td><td>&quot;VVN&quot;</td><td>5</td><td>37.660244</td><td>8.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 9)\n",
       "┌─────────┬─────────┬──────────┬───────┬───┬───────┬─────┬────────────┬───────┐\n",
       "│ Token_1 ┆ Token_2 ┆ Token_3  ┆ Tag_1 ┆ … ┆ Tag_3 ┆ AF  ┆ RF         ┆ Range │\n",
       "│ ---     ┆ ---     ┆ ---      ┆ ---   ┆   ┆ ---   ┆ --- ┆ ---        ┆ ---   │\n",
       "│ str     ┆ str     ┆ str      ┆ str   ┆   ┆ str   ┆ u32 ┆ f64        ┆ f64   │\n",
       "╞═════════╪═════════╪══════════╪═══════╪═══╪═══════╪═════╪════════════╪═══════╡\n",
       "│ can     ┆ be      ┆ seen     ┆ VM    ┆ … ┆ VVN   ┆ 17  ┆ 128.044831 ┆ 16.0  │\n",
       "│ to      ┆ be      ┆ used     ┆ TO    ┆ … ┆ VVN   ┆ 10  ┆ 75.320489  ┆ 14.0  │\n",
       "│ can     ┆ be      ┆ used     ┆ VM    ┆ … ┆ VVN   ┆ 10  ┆ 75.320489  ┆ 14.0  │\n",
       "│ will    ┆ be      ┆ asked    ┆ VM    ┆ … ┆ VVN   ┆ 7   ┆ 52.724342  ┆ 8.0   │\n",
       "│ should  ┆ be      ┆ noted    ┆ VM    ┆ … ┆ VVN   ┆ 7   ┆ 52.724342  ┆ 8.0   │\n",
       "│ could   ┆ be      ┆ used     ┆ VM    ┆ … ┆ VVN   ┆ 7   ┆ 52.724342  ┆ 10.0  │\n",
       "│ has     ┆ been    ┆ shown    ┆ VHZ   ┆ … ┆ VVN   ┆ 6   ┆ 45.192293  ┆ 8.0   │\n",
       "│ will    ┆ be      ┆ used     ┆ VM    ┆ … ┆ VVN   ┆ 5   ┆ 37.660244  ┆ 4.0   │\n",
       "│ can     ┆ be      ┆ observed ┆ VM    ┆ … ┆ VVN   ┆ 5   ┆ 37.660244  ┆ 4.0   │\n",
       "│ can     ┆ be      ┆ found    ┆ VM    ┆ … ┆ VVN   ┆ 5   ┆ 37.660244  ┆ 8.0   │\n",
       "└─────────┴─────────┴──────────┴───────┴───┴───────┴─────┴────────────┴───────┘"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nc.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b90a3e8",
   "metadata": {},
   "source": [
    "Similar ngram tables can be created for DocuScope sequences. Here we generate trigrams:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "af325a31",
   "metadata": {},
   "outputs": [],
   "source": [
    "nc = ds.clusters_by_tag(ds_tokens, tag='AcademicTerms', tag_position=3, span=3, count_by='ds')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "83b7953b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 9)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token_1</th><th>Token_2</th><th>Token_3</th><th>Tag_1</th><th>Tag_2</th><th>Tag_3</th><th>AF</th><th>RF</th><th>Range</th></tr><tr><td>str</td><td>str</td><td>str</td><td>str</td><td>str</td><td>str</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;part&quot;</td><td>&quot;time&quot;</td><td>&quot;faculty&quot;</td><td>&quot;Untagged&quot;</td><td>&quot;InformationTopics&quot;</td><td>&quot;AcademicTerms&quot;</td><td>112</td><td>1028.872741</td><td>2.0</td></tr><tr><td>&quot;nicaraguan&quot;</td><td>&quot;sign&quot;</td><td>&quot;language&quot;</td><td>&quot;Character&quot;</td><td>&quot;Untagged&quot;</td><td>&quot;AcademicTerms&quot;</td><td>13</td><td>119.422729</td><td>2.0</td></tr><tr><td>&quot;full&quot;</td><td>&quot;time&quot;</td><td>&quot;faculty&quot;</td><td>&quot;AcademicTerms&quot;</td><td>&quot;InformationTopics&quot;</td><td>&quot;AcademicTerms&quot;</td><td>11</td><td>101.050001</td><td>2.0</td></tr><tr><td>&quot;of&quot;</td><td>&quot;citizenship&quot;</td><td>&quot;education&quot;</td><td>&quot;Untagged&quot;</td><td>&quot;PublicTerms&quot;</td><td>&quot;AcademicTerms&quot;</td><td>10</td><td>91.863638</td><td>2.0</td></tr><tr><td>&quot;reinforced&quot;</td><td>&quot;concrete&quot;</td><td>&quot;structures&quot;</td><td>&quot;InformationChangePositive&quot;</td><td>&quot;Description&quot;</td><td>&quot;AcademicTerms&quot;</td><td>9</td><td>82.677274</td><td>2.0</td></tr><tr><td>&quot;national&quot;</td><td>&quot;identity&quot;</td><td>&quot;formation&quot;</td><td>&quot;PublicTerms&quot;</td><td>&quot;AcademicTerms&quot;</td><td>&quot;AcademicTerms&quot;</td><td>8</td><td>73.49091</td><td>2.0</td></tr><tr><td>&quot;of&quot;</td><td>&quot;an&quot;</td><td>&quot;electron&quot;</td><td>&quot;Untagged&quot;</td><td>&quot;Untagged&quot;</td><td>&quot;AcademicTerms&quot;</td><td>8</td><td>73.49091</td><td>2.0</td></tr><tr><td>&quot;faculty&quot;</td><td>&quot;in&quot;</td><td>&quot;higher education&quot;</td><td>&quot;AcademicTerms&quot;</td><td>&quot;Untagged&quot;</td><td>&quot;AcademicTerms&quot;</td><td>7</td><td>64.304546</td><td>2.0</td></tr><tr><td>&quot;academy&quot;</td><td>&quot;of&quot;</td><td>&quot;pediatrics&quot;</td><td>&quot;InformationTopics&quot;</td><td>&quot;Untagged&quot;</td><td>&quot;AcademicTerms&quot;</td><td>7</td><td>64.304546</td><td>2.0</td></tr><tr><td>&quot;the&quot;</td><td>&quot;rate of&quot;</td><td>&quot;photosynthesis&quot;</td><td>&quot;Untagged&quot;</td><td>&quot;AcademicTerms&quot;</td><td>&quot;AcademicTerms&quot;</td><td>7</td><td>64.304546</td><td>2.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 9)\n",
       "┌────────────┬─────────────┬─────────────┬─────────────┬───┬────────────┬─────┬────────────┬───────┐\n",
       "│ Token_1    ┆ Token_2     ┆ Token_3     ┆ Tag_1       ┆ … ┆ Tag_3      ┆ AF  ┆ RF         ┆ Range │\n",
       "│ ---        ┆ ---         ┆ ---         ┆ ---         ┆   ┆ ---        ┆ --- ┆ ---        ┆ ---   │\n",
       "│ str        ┆ str         ┆ str         ┆ str         ┆   ┆ str        ┆ u32 ┆ f64        ┆ f64   │\n",
       "╞════════════╪═════════════╪═════════════╪═════════════╪═══╪════════════╪═════╪════════════╪═══════╡\n",
       "│ part       ┆ time        ┆ faculty     ┆ Untagged    ┆ … ┆ AcademicTe ┆ 112 ┆ 1028.87274 ┆ 2.0   │\n",
       "│            ┆             ┆             ┆             ┆   ┆ rms        ┆     ┆ 1          ┆       │\n",
       "│ nicaraguan ┆ sign        ┆ language    ┆ Character   ┆ … ┆ AcademicTe ┆ 13  ┆ 119.422729 ┆ 2.0   │\n",
       "│            ┆             ┆             ┆             ┆   ┆ rms        ┆     ┆            ┆       │\n",
       "│ full       ┆ time        ┆ faculty     ┆ AcademicTer ┆ … ┆ AcademicTe ┆ 11  ┆ 101.050001 ┆ 2.0   │\n",
       "│            ┆             ┆             ┆ ms          ┆   ┆ rms        ┆     ┆            ┆       │\n",
       "│ of         ┆ citizenship ┆ education   ┆ Untagged    ┆ … ┆ AcademicTe ┆ 10  ┆ 91.863638  ┆ 2.0   │\n",
       "│            ┆             ┆             ┆             ┆   ┆ rms        ┆     ┆            ┆       │\n",
       "│ reinforced ┆ concrete    ┆ structures  ┆ Information ┆ … ┆ AcademicTe ┆ 9   ┆ 82.677274  ┆ 2.0   │\n",
       "│            ┆             ┆             ┆ ChangePosit ┆   ┆ rms        ┆     ┆            ┆       │\n",
       "│            ┆             ┆             ┆ ive         ┆   ┆            ┆     ┆            ┆       │\n",
       "│ national   ┆ identity    ┆ formation   ┆ PublicTerms ┆ … ┆ AcademicTe ┆ 8   ┆ 73.49091   ┆ 2.0   │\n",
       "│            ┆             ┆             ┆             ┆   ┆ rms        ┆     ┆            ┆       │\n",
       "│ of         ┆ an          ┆ electron    ┆ Untagged    ┆ … ┆ AcademicTe ┆ 8   ┆ 73.49091   ┆ 2.0   │\n",
       "│            ┆             ┆             ┆             ┆   ┆ rms        ┆     ┆            ┆       │\n",
       "│ faculty    ┆ in          ┆ higher      ┆ AcademicTer ┆ … ┆ AcademicTe ┆ 7   ┆ 64.304546  ┆ 2.0   │\n",
       "│            ┆             ┆ education   ┆ ms          ┆   ┆ rms        ┆     ┆            ┆       │\n",
       "│ academy    ┆ of          ┆ pediatrics  ┆ Information ┆ … ┆ AcademicTe ┆ 7   ┆ 64.304546  ┆ 2.0   │\n",
       "│            ┆             ┆             ┆ Topics      ┆   ┆ rms        ┆     ┆            ┆       │\n",
       "│ the        ┆ rate of     ┆ photosynthe ┆ Untagged    ┆ … ┆ AcademicTe ┆ 7   ┆ 64.304546  ┆ 2.0   │\n",
       "│            ┆             ┆ sis         ┆             ┆   ┆ rms        ┆     ┆            ┆       │\n",
       "└────────────┴─────────────┴─────────────┴─────────────┴───┴────────────┴─────┴────────────┴───────┘"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nc.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6478e5a",
   "metadata": {},
   "source": [
    "## Collocations\n",
    "\n",
    "Collocations within a span (left and right) of a node word can be calculated according to several association measures.\n",
    "\n",
    "The default span is 4 tokens to the left and 4 tokens to the right of the node word.\n",
    "\n",
    "Like `frequency_table`, `coll_table` requires a table of the type generated by the `docuscope_parse` function. It also requires a node word."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "194bdd0d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>Freq Span</th><th>Freq Total</th><th>MI</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>u32</td><td>f64</td></tr></thead><tbody><tr><td>&quot;collection&quot;</td><td>&quot;NN1&quot;</td><td>18</td><td>23</td><td>0.721679</td></tr><tr><td>&quot;collected&quot;</td><td>&quot;VVN&quot;</td><td>10</td><td>12</td><td>0.683613</td></tr><tr><td>&quot;conjunctions&quot;</td><td>&quot;NN2&quot;</td><td>2</td><td>1</td><td>0.66337</td></tr><tr><td>&quot;split&quot;</td><td>&quot;VV0&quot;</td><td>2</td><td>1</td><td>0.66337</td></tr><tr><td>&quot;weighting&quot;</td><td>&quot;NN1&quot;</td><td>2</td><td>1</td><td>0.66337</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 5)\n",
       "┌──────────────┬─────┬───────────┬────────────┬──────────┐\n",
       "│ Token        ┆ Tag ┆ Freq Span ┆ Freq Total ┆ MI       │\n",
       "│ ---          ┆ --- ┆ ---       ┆ ---        ┆ ---      │\n",
       "│ str          ┆ str ┆ u32       ┆ u32        ┆ f64      │\n",
       "╞══════════════╪═════╪═══════════╪════════════╪══════════╡\n",
       "│ collection   ┆ NN1 ┆ 18        ┆ 23         ┆ 0.721679 │\n",
       "│ collected    ┆ VVN ┆ 10        ┆ 12         ┆ 0.683613 │\n",
       "│ conjunctions ┆ NN2 ┆ 2         ┆ 1          ┆ 0.66337  │\n",
       "│ split        ┆ VV0 ┆ 2         ┆ 1          ┆ 0.66337  │\n",
       "│ weighting    ┆ NN1 ┆ 2         ┆ 1          ┆ 0.66337  │\n",
       "└──────────────┴─────┴───────────┴────────────┴──────────┘"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds.coll_table(ds_tokens, 'data').head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5f63e1b",
   "metadata": {},
   "source": [
    "You can also specify a node tag (by default, tags are ignored) and an association measure statistic from the point-wise mutual information family ('pmi', 'pmi2', 'pmi3', or 'npmi', which is the default)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "7859f327",
   "metadata": {},
   "outputs": [],
   "source": [
    "ct = ds.coll_table(ds_tokens, 'can', node_tag='V', statistic='pmi', count_by='pos')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "9c9b8445",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>Freq Span</th><th>Freq Total</th><th>MI</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>u32</td><td>f64</td></tr></thead><tbody><tr><td>&quot;perceive&quot;</td><td>&quot;NN1&quot;</td><td>2</td><td>1</td><td>9.294012</td></tr><tr><td>&quot;undone&quot;</td><td>&quot;VVN&quot;</td><td>2</td><td>1</td><td>9.294012</td></tr><tr><td>&quot;1b&quot;</td><td>&quot;FO&quot;</td><td>1</td><td>1</td><td>8.294012</td></tr><tr><td>&quot;abrasion&quot;</td><td>&quot;NN1&quot;</td><td>1</td><td>1</td><td>8.294012</td></tr><tr><td>&quot;abrogate&quot;</td><td>&quot;VVI&quot;</td><td>1</td><td>1</td><td>8.294012</td></tr><tr><td>&quot;absorb&quot;</td><td>&quot;VVI&quot;</td><td>1</td><td>1</td><td>8.294012</td></tr><tr><td>&quot;additives&quot;</td><td>&quot;VVZ&quot;</td><td>1</td><td>1</td><td>8.294012</td></tr><tr><td>&quot;altered&quot;</td><td>&quot;JJ&quot;</td><td>1</td><td>1</td><td>8.294012</td></tr><tr><td>&quot;ameliorate&quot;</td><td>&quot;VVI&quot;</td><td>1</td><td>1</td><td>8.294012</td></tr><tr><td>&quot;anew&quot;</td><td>&quot;RR&quot;</td><td>1</td><td>1</td><td>8.294012</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 5)\n",
       "┌────────────┬─────┬───────────┬────────────┬──────────┐\n",
       "│ Token      ┆ Tag ┆ Freq Span ┆ Freq Total ┆ MI       │\n",
       "│ ---        ┆ --- ┆ ---       ┆ ---        ┆ ---      │\n",
       "│ str        ┆ str ┆ u32       ┆ u32        ┆ f64      │\n",
       "╞════════════╪═════╪═══════════╪════════════╪══════════╡\n",
       "│ perceive   ┆ NN1 ┆ 2         ┆ 1          ┆ 9.294012 │\n",
       "│ undone     ┆ VVN ┆ 2         ┆ 1          ┆ 9.294012 │\n",
       "│ 1b         ┆ FO  ┆ 1         ┆ 1          ┆ 8.294012 │\n",
       "│ abrasion   ┆ NN1 ┆ 1         ┆ 1          ┆ 8.294012 │\n",
       "│ abrogate   ┆ VVI ┆ 1         ┆ 1          ┆ 8.294012 │\n",
       "│ absorb     ┆ VVI ┆ 1         ┆ 1          ┆ 8.294012 │\n",
       "│ additives  ┆ VVZ ┆ 1         ┆ 1          ┆ 8.294012 │\n",
       "│ altered    ┆ JJ  ┆ 1         ┆ 1          ┆ 8.294012 │\n",
       "│ ameliorate ┆ VVI ┆ 1         ┆ 1          ┆ 8.294012 │\n",
       "│ anew       ┆ RR  ┆ 1         ┆ 1          ┆ 8.294012 │\n",
       "└────────────┴─────┴───────────┴────────────┴──────────┘"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "3fb3face",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (187, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>Freq Span</th><th>Freq Total</th><th>MI</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>u32</td><td>f64</td></tr></thead><tbody><tr><td>&quot;assume&quot;</td><td>&quot;VVI&quot;</td><td>6</td><td>9</td><td>7.70905</td></tr><tr><td>&quot;arise&quot;</td><td>&quot;VVI&quot;</td><td>3</td><td>6</td><td>7.294012</td></tr><tr><td>&quot;occur&quot;</td><td>&quot;VVI&quot;</td><td>11</td><td>23</td><td>7.229882</td></tr><tr><td>&quot;seen&quot;</td><td>&quot;VVN&quot;</td><td>18</td><td>39</td><td>7.178535</td></tr><tr><td>&quot;achieved&quot;</td><td>&quot;VVN&quot;</td><td>3</td><td>7</td><td>7.07162</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;have&quot;</td><td>&quot;VH0&quot;</td><td>2</td><td>296</td><td>1.084559</td></tr><tr><td>&quot;was&quot;</td><td>&quot;VBDZ&quot;</td><td>4</td><td>594</td><td>1.079693</td></tr><tr><td>&quot;is&quot;</td><td>&quot;VBZ&quot;</td><td>11</td><td>1784</td><td>0.952544</td></tr><tr><td>&quot;does&quot;</td><td>&quot;VDZ&quot;</td><td>1</td><td>165</td><td>0.92769</td></tr><tr><td>&quot;will&quot;</td><td>&quot;VM&quot;</td><td>2</td><td>512</td><td>0.294012</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (187, 5)\n",
       "┌──────────┬──────┬───────────┬────────────┬──────────┐\n",
       "│ Token    ┆ Tag  ┆ Freq Span ┆ Freq Total ┆ MI       │\n",
       "│ ---      ┆ ---  ┆ ---       ┆ ---        ┆ ---      │\n",
       "│ str      ┆ str  ┆ u32       ┆ u32        ┆ f64      │\n",
       "╞══════════╪══════╪═══════════╪════════════╪══════════╡\n",
       "│ assume   ┆ VVI  ┆ 6         ┆ 9          ┆ 7.70905  │\n",
       "│ arise    ┆ VVI  ┆ 3         ┆ 6          ┆ 7.294012 │\n",
       "│ occur    ┆ VVI  ┆ 11        ┆ 23         ┆ 7.229882 │\n",
       "│ seen     ┆ VVN  ┆ 18        ┆ 39         ┆ 7.178535 │\n",
       "│ achieved ┆ VVN  ┆ 3         ┆ 7          ┆ 7.07162  │\n",
       "│ …        ┆ …    ┆ …         ┆ …          ┆ …        │\n",
       "│ have     ┆ VH0  ┆ 2         ┆ 296        ┆ 1.084559 │\n",
       "│ was      ┆ VBDZ ┆ 4         ┆ 594        ┆ 1.079693 │\n",
       "│ is       ┆ VBZ  ┆ 11        ┆ 1784       ┆ 0.952544 │\n",
       "│ does     ┆ VDZ  ┆ 1         ┆ 165        ┆ 0.92769  │\n",
       "│ will     ┆ VM   ┆ 2         ┆ 512        ┆ 0.294012 │\n",
       "└──────────┴──────┴───────────┴────────────┴──────────┘"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct.filter(\n",
    "    (pl.col(\"Freq Total\") > 5) &\n",
    "    (pl.col(\"Tag\").str.starts_with(\"V\"))\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "c9c6900f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 5)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>Freq Span</th><th>Freq Total</th><th>MI</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>u32</td><td>f64</td></tr></thead><tbody><tr><td>&quot;believing that&quot;</td><td>&quot;Character&quot;</td><td>2</td><td>3</td><td>-21.383312</td></tr><tr><td>&quot;cure&quot;</td><td>&quot;Positive&quot;</td><td>2</td><td>3</td><td>-21.383312</td></tr><tr><td>&quot;falsely&quot;</td><td>&quot;Negative&quot;</td><td>2</td><td>3</td><td>-21.383312</td></tr><tr><td>&quot;of&quot;</td><td>&quot;Untagged&quot;</td><td>20</td><td>3148</td><td>-21.452785</td></tr><tr><td>&quot;more and more&quot;</td><td>&quot;ForceStressed&quot;</td><td>2</td><td>4</td><td>-21.798349</td></tr><tr><td>&quot;infected&quot;</td><td>&quot;InformationChangeNegative&quot;</td><td>3</td><td>15</td><td>-21.950352</td></tr><tr><td>&quot;and&quot;</td><td>&quot;Untagged&quot;</td><td>18</td><td>3506</td><td>-22.064185</td></tr><tr><td>&quot;who had&quot;</td><td>&quot;Narrative&quot;</td><td>2</td><td>5</td><td>-22.120277</td></tr><tr><td>&quot;number&quot;</td><td>&quot;Untagged&quot;</td><td>4</td><td>44</td><td>-22.257781</td></tr><tr><td>&quot;sera&quot;</td><td>&quot;Description&quot;</td><td>2</td><td>6</td><td>-22.383312</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 5)\n",
       "┌────────────────┬───────────────────────────┬───────────┬────────────┬────────────┐\n",
       "│ Token          ┆ Tag                       ┆ Freq Span ┆ Freq Total ┆ MI         │\n",
       "│ ---            ┆ ---                       ┆ ---       ┆ ---        ┆ ---        │\n",
       "│ str            ┆ str                       ┆ u32       ┆ u32        ┆ f64        │\n",
       "╞════════════════╪═══════════════════════════╪═══════════╪════════════╪════════════╡\n",
       "│ believing that ┆ Character                 ┆ 2         ┆ 3          ┆ -21.383312 │\n",
       "│ cure           ┆ Positive                  ┆ 2         ┆ 3          ┆ -21.383312 │\n",
       "│ falsely        ┆ Negative                  ┆ 2         ┆ 3          ┆ -21.383312 │\n",
       "│ of             ┆ Untagged                  ┆ 20        ┆ 3148       ┆ -21.452785 │\n",
       "│ more and more  ┆ ForceStressed             ┆ 2         ┆ 4          ┆ -21.798349 │\n",
       "│ infected       ┆ InformationChangeNegative ┆ 3         ┆ 15         ┆ -21.950352 │\n",
       "│ and            ┆ Untagged                  ┆ 18        ┆ 3506       ┆ -22.064185 │\n",
       "│ who had        ┆ Narrative                 ┆ 2         ┆ 5          ┆ -22.120277 │\n",
       "│ number         ┆ Untagged                  ┆ 4         ┆ 44         ┆ -22.257781 │\n",
       "│ sera           ┆ Description               ┆ 2         ┆ 6          ┆ -22.383312 │\n",
       "└────────────────┴───────────────────────────┴───────────┴────────────┴────────────┘"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ct = ds.coll_table(ds_tokens, 'people', node_tag='Character', statistic='pmi3', count_by='ds')\n",
    "ct.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c91c55b",
   "metadata": {},
   "source": [
    "## Document-term matrices for tags\n",
    "\n",
    "Document-term matrices are basic data structures for text analysis. Each row is a document (observation) and each column is a token (variable). These [can be produced by **tmtoolkit**](https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Generating-a-sparse-document-term-matrix-(DTM))) using the `dtm` function.\n",
    "\n",
    "The **docuscopspacy** package allows for the creation of dtms with tag counts (rather than token counts) as variables.\n",
    "\n",
    "These are produced by the `tags_dtm` function, which takes a dictionary created by the `convert_corpus` function and a `count_by` argument of either 'pos' or 'ds'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "d1b3d472",
   "metadata": {},
   "outputs": [],
   "source": [
    "tm = ds.tags_dtm(ds_tokens)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88ceee89",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "    \n",
    "**Warning: `doc_id` column**\n",
    "\n",
    "The first column, 'doc_id', contains the names of the document files.  The `tags_dtm` function does not place document ids as row names initally as a saftey feature. Row names **must** be unique. Setting the document ids as a column allows users to account for any duplicates before proceeding.\n",
    "\n",
    "</div>\n",
    "\n",
    "The count that is returned is the raw count."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "315e05b0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 127)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>doc_id</th><th>NN1</th><th>JJ</th><th>AT</th><th>II</th><th>NN2</th><th>IO</th><th>NP1</th><th>CC</th><th>RR</th><th>VVI</th><th>AT1</th><th>VVN</th><th>MC</th><th>TO</th><th>VVG</th><th>VM</th><th>VBZ</th><th>VVZ</th><th>CST</th><th>VV0</th><th>DD1</th><th>VVD</th><th>APPGE</th><th>CS</th><th>IF</th><th>PPH1</th><th>IW</th><th>VBI</th><th>GE</th><th>XX</th><th>VBR</th><th>DDQ</th><th>NNT1</th><th>VBDZ</th><th>CSA</th><th>DD2</th><th>&hellip;</th><th>PPHO1</th><th>FW</th><th>PPX2</th><th>DAT</th><th>MC2</th><th>NNU2</th><th>NPM1</th><th>UH</th><th>VDI</th><th>VHG</th><th>NP2</th><th>VDN</th><th>NNB</th><th>PPIO2</th><th>MCMC</th><th>RGQ</th><th>VHN</th><th>DDQGE</th><th>PNQO</th><th>VDG</th><th>VBM</th><th>RRT</th><th>VMK</th><th>DDQV</th><th>PN</th><th>PPIO1</th><th>NNO2</th><th>NNU1</th><th>PPGE</th><th>NPD1</th><th>NNO</th><th>MF</th><th>PNQV</th><th>VVGK</th><th>RPK</th><th>RGQV</th><th>RRQV</th></tr><tr><td>str</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>&hellip;</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td></tr></thead><tbody><tr><td>&quot;acad_01.txt&quot;</td><td>252</td><td>62</td><td>99</td><td>70</td><td>69</td><td>83</td><td>2</td><td>14</td><td>24</td><td>23</td><td>24</td><td>52</td><td>28</td><td>13</td><td>13</td><td>20</td><td>16</td><td>5</td><td>15</td><td>2</td><td>22</td><td>12</td><td>0</td><td>5</td><td>12</td><td>13</td><td>8</td><td>7</td><td>1</td><td>6</td><td>3</td><td>1</td><td>2</td><td>18</td><td>3</td><td>2</td><td>&hellip;</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_02.txt&quot;</td><td>419</td><td>263</td><td>187</td><td>219</td><td>229</td><td>129</td><td>62</td><td>70</td><td>137</td><td>75</td><td>72</td><td>61</td><td>17</td><td>33</td><td>21</td><td>74</td><td>54</td><td>54</td><td>48</td><td>43</td><td>49</td><td>17</td><td>15</td><td>36</td><td>11</td><td>40</td><td>25</td><td>30</td><td>15</td><td>15</td><td>21</td><td>14</td><td>12</td><td>2</td><td>14</td><td>14</td><td>&hellip;</td><td>0</td><td>0</td><td>4</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>2</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_03.txt&quot;</td><td>1345</td><td>816</td><td>377</td><td>701</td><td>825</td><td>330</td><td>353</td><td>354</td><td>257</td><td>188</td><td>124</td><td>166</td><td>353</td><td>90</td><td>98</td><td>148</td><td>89</td><td>79</td><td>87</td><td>133</td><td>73</td><td>74</td><td>41</td><td>59</td><td>40</td><td>45</td><td>73</td><td>52</td><td>27</td><td>35</td><td>66</td><td>36</td><td>41</td><td>13</td><td>14</td><td>28</td><td>&hellip;</td><td>0</td><td>1</td><td>0</td><td>6</td><td>4</td><td>0</td><td>0</td><td>20</td><td>1</td><td>2</td><td>1</td><td>0</td><td>0</td><td>0</td><td>4</td><td>2</td><td>0</td><td>2</td><td>0</td><td>0</td><td>1</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>2</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_04.txt&quot;</td><td>270</td><td>102</td><td>90</td><td>76</td><td>111</td><td>38</td><td>26</td><td>41</td><td>40</td><td>36</td><td>28</td><td>73</td><td>46</td><td>24</td><td>18</td><td>30</td><td>17</td><td>11</td><td>8</td><td>5</td><td>28</td><td>9</td><td>5</td><td>10</td><td>27</td><td>6</td><td>8</td><td>22</td><td>7</td><td>14</td><td>6</td><td>8</td><td>10</td><td>9</td><td>0</td><td>12</td><td>&hellip;</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_05.txt&quot;</td><td>508</td><td>196</td><td>199</td><td>148</td><td>128</td><td>70</td><td>20</td><td>48</td><td>41</td><td>41</td><td>63</td><td>78</td><td>38</td><td>24</td><td>43</td><td>40</td><td>45</td><td>56</td><td>10</td><td>25</td><td>39</td><td>12</td><td>1</td><td>29</td><td>23</td><td>13</td><td>16</td><td>23</td><td>5</td><td>10</td><td>10</td><td>16</td><td>2</td><td>14</td><td>9</td><td>5</td><td>&hellip;</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_06.txt&quot;</td><td>708</td><td>288</td><td>240</td><td>268</td><td>271</td><td>121</td><td>34</td><td>70</td><td>101</td><td>125</td><td>78</td><td>90</td><td>24</td><td>68</td><td>73</td><td>83</td><td>57</td><td>64</td><td>34</td><td>43</td><td>44</td><td>15</td><td>5</td><td>24</td><td>26</td><td>16</td><td>31</td><td>31</td><td>3</td><td>18</td><td>31</td><td>28</td><td>8</td><td>3</td><td>9</td><td>20</td><td>&hellip;</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>2</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_07.txt&quot;</td><td>1197</td><td>534</td><td>352</td><td>391</td><td>509</td><td>175</td><td>159</td><td>219</td><td>204</td><td>169</td><td>137</td><td>217</td><td>82</td><td>93</td><td>72</td><td>177</td><td>121</td><td>64</td><td>61</td><td>69</td><td>69</td><td>24</td><td>13</td><td>75</td><td>81</td><td>45</td><td>32</td><td>96</td><td>4</td><td>55</td><td>73</td><td>29</td><td>9</td><td>13</td><td>11</td><td>33</td><td>&hellip;</td><td>0</td><td>0</td><td>0</td><td>4</td><td>0</td><td>2</td><td>0</td><td>0</td><td>1</td><td>1</td><td>8</td><td>1</td><td>1</td><td>1</td><td>0</td><td>0</td><td>2</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>1</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_08.txt&quot;</td><td>171</td><td>56</td><td>51</td><td>103</td><td>55</td><td>26</td><td>71</td><td>44</td><td>38</td><td>52</td><td>25</td><td>17</td><td>4</td><td>39</td><td>28</td><td>19</td><td>38</td><td>20</td><td>19</td><td>5</td><td>9</td><td>12</td><td>20</td><td>7</td><td>12</td><td>13</td><td>8</td><td>4</td><td>21</td><td>4</td><td>7</td><td>6</td><td>11</td><td>7</td><td>4</td><td>2</td><td>&hellip;</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>3</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_09.txt&quot;</td><td>307</td><td>153</td><td>196</td><td>165</td><td>108</td><td>94</td><td>281</td><td>83</td><td>74</td><td>46</td><td>42</td><td>76</td><td>27</td><td>50</td><td>36</td><td>27</td><td>10</td><td>24</td><td>44</td><td>11</td><td>18</td><td>95</td><td>65</td><td>40</td><td>36</td><td>17</td><td>24</td><td>13</td><td>16</td><td>15</td><td>1</td><td>4</td><td>2</td><td>53</td><td>9</td><td>7</td><td>&hellip;</td><td>12</td><td>0</td><td>1</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>2</td><td>3</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>1</td><td>0</td><td>0</td><td>3</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_10.txt&quot;</td><td>1033</td><td>482</td><td>455</td><td>510</td><td>231</td><td>286</td><td>311</td><td>153</td><td>240</td><td>107</td><td>201</td><td>120</td><td>56</td><td>78</td><td>98</td><td>59</td><td>101</td><td>156</td><td>80</td><td>52</td><td>102</td><td>52</td><td>68</td><td>51</td><td>32</td><td>48</td><td>32</td><td>29</td><td>41</td><td>21</td><td>21</td><td>43</td><td>10</td><td>24</td><td>31</td><td>27</td><td>&hellip;</td><td>4</td><td>6</td><td>1</td><td>0</td><td>0</td><td>0</td><td>0</td><td>1</td><td>2</td><td>4</td><td>4</td><td>0</td><td>0</td><td>0</td><td>0</td><td>4</td><td>0</td><td>2</td><td>2</td><td>1</td><td>0</td><td>1</td><td>2</td><td>0</td><td>2</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 127)\n",
       "┌─────────────┬──────┬─────┬─────┬───┬──────┬─────┬──────┬──────┐\n",
       "│ doc_id      ┆ NN1  ┆ JJ  ┆ AT  ┆ … ┆ VVGK ┆ RPK ┆ RGQV ┆ RRQV │\n",
       "│ ---         ┆ ---  ┆ --- ┆ --- ┆   ┆ ---  ┆ --- ┆ ---  ┆ ---  │\n",
       "│ str         ┆ u32  ┆ u32 ┆ u32 ┆   ┆ u32  ┆ u32 ┆ u32  ┆ u32  │\n",
       "╞═════════════╪══════╪═════╪═════╪═══╪══════╪═════╪══════╪══════╡\n",
       "│ acad_01.txt ┆ 252  ┆ 62  ┆ 99  ┆ … ┆ 0    ┆ 0   ┆ 0    ┆ 0    │\n",
       "│ acad_02.txt ┆ 419  ┆ 263 ┆ 187 ┆ … ┆ 0    ┆ 0   ┆ 0    ┆ 0    │\n",
       "│ acad_03.txt ┆ 1345 ┆ 816 ┆ 377 ┆ … ┆ 0    ┆ 0   ┆ 0    ┆ 0    │\n",
       "│ acad_04.txt ┆ 270  ┆ 102 ┆ 90  ┆ … ┆ 0    ┆ 0   ┆ 0    ┆ 0    │\n",
       "│ acad_05.txt ┆ 508  ┆ 196 ┆ 199 ┆ … ┆ 0    ┆ 0   ┆ 0    ┆ 0    │\n",
       "│ acad_06.txt ┆ 708  ┆ 288 ┆ 240 ┆ … ┆ 0    ┆ 0   ┆ 0    ┆ 0    │\n",
       "│ acad_07.txt ┆ 1197 ┆ 534 ┆ 352 ┆ … ┆ 0    ┆ 0   ┆ 0    ┆ 0    │\n",
       "│ acad_08.txt ┆ 171  ┆ 56  ┆ 51  ┆ … ┆ 0    ┆ 0   ┆ 0    ┆ 0    │\n",
       "│ acad_09.txt ┆ 307  ┆ 153 ┆ 196 ┆ … ┆ 0    ┆ 0   ┆ 0    ┆ 0    │\n",
       "│ acad_10.txt ┆ 1033 ┆ 482 ┆ 455 ┆ … ┆ 0    ┆ 0   ┆ 0    ┆ 0    │\n",
       "└─────────────┴──────┴─────┴─────┴───┴──────┴─────┴──────┴──────┘"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tm.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "119326c5",
   "metadata": {},
   "source": [
    "A similar dtm can be created for DocuScope categories by setting `count_by` to 'ds':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "42ce2bee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 38)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>doc_id</th><th>Untagged</th><th>AcademicTerms</th><th>Character</th><th>Narrative</th><th>Description</th><th>InformationExposition</th><th>InformationTopics</th><th>Negative</th><th>Positive</th><th>MetadiscourseCohesive</th><th>Reasoning</th><th>ForceStressed</th><th>PublicTerms</th><th>Strategic</th><th>InformationStates</th><th>InformationChange</th><th>ConfidenceHedged</th><th>InformationReportVerbs</th><th>Citation</th><th>InformationPlace</th><th>Interactive</th><th>Inquiry</th><th>Future</th><th>ConfidenceHigh</th><th>Contingent</th><th>AcademicWritingMoves</th><th>Facilitate</th><th>MetadiscourseInteractive</th><th>Updates</th><th>InformationChangePositive</th><th>CitationAuthority</th><th>FirstPerson</th><th>Responsibility</th><th>InformationChangeNegative</th><th>Uncertainty</th><th>ConfidenceLow</th><th>CitationHedged</th></tr><tr><td>str</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td><td>u32</td></tr></thead><tbody><tr><td>&quot;acad_01.txt&quot;</td><td>324</td><td>127</td><td>15</td><td>66</td><td>70</td><td>57</td><td>15</td><td>10</td><td>9</td><td>12</td><td>26</td><td>7</td><td>4</td><td>10</td><td>9</td><td>10</td><td>15</td><td>17</td><td>0</td><td>0</td><td>3</td><td>18</td><td>3</td><td>3</td><td>0</td><td>16</td><td>1</td><td>3</td><td>0</td><td>1</td><td>2</td><td>0</td><td>2</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_02.txt&quot;</td><td>760</td><td>255</td><td>79</td><td>133</td><td>132</td><td>157</td><td>74</td><td>67</td><td>66</td><td>97</td><td>51</td><td>54</td><td>18</td><td>24</td><td>33</td><td>40</td><td>60</td><td>38</td><td>12</td><td>9</td><td>22</td><td>8</td><td>20</td><td>20</td><td>38</td><td>5</td><td>7</td><td>3</td><td>8</td><td>26</td><td>3</td><td>9</td><td>0</td><td>2</td><td>1</td><td>1</td><td>1</td></tr><tr><td>&quot;acad_03.txt&quot;</td><td>2392</td><td>844</td><td>465</td><td>422</td><td>435</td><td>428</td><td>240</td><td>201</td><td>160</td><td>142</td><td>160</td><td>126</td><td>52</td><td>78</td><td>124</td><td>130</td><td>137</td><td>57</td><td>415</td><td>49</td><td>39</td><td>82</td><td>42</td><td>30</td><td>43</td><td>20</td><td>28</td><td>31</td><td>21</td><td>47</td><td>23</td><td>42</td><td>3</td><td>32</td><td>9</td><td>1</td><td>3</td></tr><tr><td>&quot;acad_04.txt&quot;</td><td>373</td><td>72</td><td>28</td><td>64</td><td>161</td><td>73</td><td>29</td><td>31</td><td>42</td><td>39</td><td>35</td><td>17</td><td>22</td><td>35</td><td>12</td><td>12</td><td>19</td><td>23</td><td>3</td><td>9</td><td>7</td><td>6</td><td>11</td><td>4</td><td>6</td><td>24</td><td>12</td><td>1</td><td>1</td><td>2</td><td>2</td><td>1</td><td>2</td><td>1</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_05.txt&quot;</td><td>651</td><td>200</td><td>47</td><td>133</td><td>172</td><td>79</td><td>77</td><td>73</td><td>18</td><td>42</td><td>52</td><td>33</td><td>2</td><td>14</td><td>33</td><td>65</td><td>21</td><td>27</td><td>3</td><td>0</td><td>7</td><td>10</td><td>21</td><td>5</td><td>19</td><td>17</td><td>7</td><td>5</td><td>3</td><td>0</td><td>0</td><td>1</td><td>2</td><td>0</td><td>0</td><td>1</td><td>0</td></tr><tr><td>&quot;acad_06.txt&quot;</td><td>777</td><td>188</td><td>99</td><td>107</td><td>420</td><td>101</td><td>72</td><td>131</td><td>84</td><td>106</td><td>54</td><td>55</td><td>32</td><td>41</td><td>55</td><td>39</td><td>65</td><td>30</td><td>16</td><td>23</td><td>16</td><td>7</td><td>23</td><td>19</td><td>30</td><td>11</td><td>14</td><td>5</td><td>7</td><td>29</td><td>14</td><td>0</td><td>23</td><td>27</td><td>0</td><td>1</td><td>0</td></tr><tr><td>&quot;acad_07.txt&quot;</td><td>1621</td><td>395</td><td>159</td><td>245</td><td>556</td><td>285</td><td>291</td><td>126</td><td>153</td><td>137</td><td>84</td><td>101</td><td>47</td><td>82</td><td>123</td><td>61</td><td>104</td><td>88</td><td>23</td><td>35</td><td>45</td><td>11</td><td>86</td><td>36</td><td>54</td><td>28</td><td>25</td><td>14</td><td>22</td><td>25</td><td>6</td><td>4</td><td>13</td><td>2</td><td>8</td><td>2</td><td>2</td></tr><tr><td>&quot;acad_08.txt&quot;</td><td>292</td><td>60</td><td>78</td><td>48</td><td>27</td><td>36</td><td>20</td><td>33</td><td>65</td><td>21</td><td>26</td><td>34</td><td>37</td><td>10</td><td>30</td><td>22</td><td>7</td><td>18</td><td>4</td><td>2</td><td>4</td><td>5</td><td>16</td><td>6</td><td>3</td><td>0</td><td>7</td><td>2</td><td>1</td><td>3</td><td>3</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td></tr><tr><td>&quot;acad_09.txt&quot;</td><td>645</td><td>59</td><td>360</td><td>171</td><td>100</td><td>59</td><td>20</td><td>128</td><td>71</td><td>35</td><td>27</td><td>41</td><td>46</td><td>47</td><td>7</td><td>7</td><td>12</td><td>13</td><td>19</td><td>72</td><td>7</td><td>3</td><td>9</td><td>21</td><td>18</td><td>1</td><td>8</td><td>3</td><td>7</td><td>3</td><td>3</td><td>0</td><td>11</td><td>4</td><td>5</td><td>0</td><td>2</td></tr><tr><td>&quot;acad_10.txt&quot;</td><td>1948</td><td>466</td><td>483</td><td>319</td><td>226</td><td>238</td><td>79</td><td>111</td><td>119</td><td>106</td><td>80</td><td>127</td><td>54</td><td>63</td><td>71</td><td>22</td><td>45</td><td>23</td><td>39</td><td>57</td><td>88</td><td>31</td><td>28</td><td>50</td><td>15</td><td>9</td><td>10</td><td>36</td><td>13</td><td>15</td><td>19</td><td>11</td><td>1</td><td>4</td><td>4</td><td>0</td><td>0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 38)\n",
       "┌───────────┬──────────┬───────────┬───────────┬───┬───────────┬───────────┬───────────┬───────────┐\n",
       "│ doc_id    ┆ Untagged ┆ AcademicT ┆ Character ┆ … ┆ Informati ┆ Uncertain ┆ Confidenc ┆ CitationH │\n",
       "│ ---       ┆ ---      ┆ erms      ┆ ---       ┆   ┆ onChangeN ┆ ty        ┆ eLow      ┆ edged     │\n",
       "│ str       ┆ u32      ┆ ---       ┆ u32       ┆   ┆ egative   ┆ ---       ┆ ---       ┆ ---       │\n",
       "│           ┆          ┆ u32       ┆           ┆   ┆ ---       ┆ u32       ┆ u32       ┆ u32       │\n",
       "│           ┆          ┆           ┆           ┆   ┆ u32       ┆           ┆           ┆           │\n",
       "╞═══════════╪══════════╪═══════════╪═══════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n",
       "│ acad_01.t ┆ 324      ┆ 127       ┆ 15        ┆ … ┆ 0         ┆ 0         ┆ 0         ┆ 0         │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_02.t ┆ 760      ┆ 255       ┆ 79        ┆ … ┆ 2         ┆ 1         ┆ 1         ┆ 1         │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_03.t ┆ 2392     ┆ 844       ┆ 465       ┆ … ┆ 32        ┆ 9         ┆ 1         ┆ 3         │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_04.t ┆ 373      ┆ 72        ┆ 28        ┆ … ┆ 1         ┆ 0         ┆ 0         ┆ 0         │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_05.t ┆ 651      ┆ 200       ┆ 47        ┆ … ┆ 0         ┆ 0         ┆ 1         ┆ 0         │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_06.t ┆ 777      ┆ 188       ┆ 99        ┆ … ┆ 27        ┆ 0         ┆ 1         ┆ 0         │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_07.t ┆ 1621     ┆ 395       ┆ 159       ┆ … ┆ 2         ┆ 8         ┆ 2         ┆ 2         │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_08.t ┆ 292      ┆ 60        ┆ 78        ┆ … ┆ 0         ┆ 0         ┆ 0         ┆ 0         │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_09.t ┆ 645      ┆ 59        ┆ 360       ┆ … ┆ 4         ┆ 5         ┆ 0         ┆ 2         │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_10.t ┆ 1948     ┆ 466       ┆ 483       ┆ … ┆ 4         ┆ 4         ┆ 0         ┆ 0         │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "└───────────┴──────────┴───────────┴───────────┴───┴───────────┴───────────┴───────────┴───────────┘"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tm = ds.tags_dtm(ds_tokens, count_by='ds')\n",
    "tm.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6c1c257-0bc9-4804-aa62-a942bd6b774e",
   "metadata": {},
   "source": [
    "Counts can also be normalized using the `dtm_weight` function. The scheme can either be set to 'prop', 'scale', or 'tfidf'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "6d6eb787",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 38)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>doc_id</th><th>Untagged</th><th>AcademicTerms</th><th>Character</th><th>Narrative</th><th>Description</th><th>InformationExposition</th><th>InformationTopics</th><th>Negative</th><th>Positive</th><th>MetadiscourseCohesive</th><th>Reasoning</th><th>ForceStressed</th><th>PublicTerms</th><th>Strategic</th><th>InformationStates</th><th>InformationChange</th><th>ConfidenceHedged</th><th>InformationReportVerbs</th><th>Citation</th><th>InformationPlace</th><th>Interactive</th><th>Inquiry</th><th>Future</th><th>ConfidenceHigh</th><th>Contingent</th><th>AcademicWritingMoves</th><th>Facilitate</th><th>MetadiscourseInteractive</th><th>Updates</th><th>InformationChangePositive</th><th>CitationAuthority</th><th>FirstPerson</th><th>Responsibility</th><th>InformationChangeNegative</th><th>Uncertainty</th><th>ConfidenceLow</th><th>CitationHedged</th></tr><tr><td>str</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;acad_01.txt&quot;</td><td>0.378947</td><td>0.148538</td><td>0.017544</td><td>0.077193</td><td>0.081871</td><td>0.066667</td><td>0.017544</td><td>0.011696</td><td>0.010526</td><td>0.014035</td><td>0.030409</td><td>0.008187</td><td>0.004678</td><td>0.011696</td><td>0.010526</td><td>0.011696</td><td>0.017544</td><td>0.019883</td><td>0.0</td><td>0.0</td><td>0.003509</td><td>0.021053</td><td>0.003509</td><td>0.003509</td><td>0.0</td><td>0.018713</td><td>0.00117</td><td>0.003509</td><td>0.0</td><td>0.00117</td><td>0.002339</td><td>0.0</td><td>0.002339</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td></tr><tr><td>&quot;acad_02.txt&quot;</td><td>0.325761</td><td>0.109301</td><td>0.033862</td><td>0.057008</td><td>0.05658</td><td>0.067295</td><td>0.031719</td><td>0.028718</td><td>0.02829</td><td>0.041577</td><td>0.02186</td><td>0.023146</td><td>0.007715</td><td>0.010287</td><td>0.014145</td><td>0.017145</td><td>0.025718</td><td>0.016288</td><td>0.005144</td><td>0.003858</td><td>0.00943</td><td>0.003429</td><td>0.008573</td><td>0.008573</td><td>0.016288</td><td>0.002143</td><td>0.003</td><td>0.001286</td><td>0.003429</td><td>0.011144</td><td>0.001286</td><td>0.003858</td><td>0.0</td><td>0.000857</td><td>0.000429</td><td>0.000429</td><td>0.000429</td></tr><tr><td>&quot;acad_03.txt&quot;</td><td>0.316695</td><td>0.111744</td><td>0.061565</td><td>0.055872</td><td>0.057593</td><td>0.056666</td><td>0.031775</td><td>0.026612</td><td>0.021184</td><td>0.0188</td><td>0.021184</td><td>0.016682</td><td>0.006885</td><td>0.010327</td><td>0.016417</td><td>0.017212</td><td>0.018138</td><td>0.007547</td><td>0.054945</td><td>0.006487</td><td>0.005164</td><td>0.010857</td><td>0.005561</td><td>0.003972</td><td>0.005693</td><td>0.002648</td><td>0.003707</td><td>0.004104</td><td>0.00278</td><td>0.006223</td><td>0.003045</td><td>0.005561</td><td>0.000397</td><td>0.004237</td><td>0.001192</td><td>0.000132</td><td>0.000397</td></tr><tr><td>&quot;acad_04.txt&quot;</td><td>0.31637</td><td>0.061069</td><td>0.023749</td><td>0.054283</td><td>0.136556</td><td>0.061917</td><td>0.024597</td><td>0.026293</td><td>0.035623</td><td>0.033079</td><td>0.029686</td><td>0.014419</td><td>0.01866</td><td>0.029686</td><td>0.010178</td><td>0.010178</td><td>0.016115</td><td>0.019508</td><td>0.002545</td><td>0.007634</td><td>0.005937</td><td>0.005089</td><td>0.00933</td><td>0.003393</td><td>0.005089</td><td>0.020356</td><td>0.010178</td><td>0.000848</td><td>0.000848</td><td>0.001696</td><td>0.001696</td><td>0.000848</td><td>0.001696</td><td>0.000848</td><td>0.0</td><td>0.0</td><td>0.0</td></tr><tr><td>&quot;acad_05.txt&quot;</td><td>0.353804</td><td>0.108696</td><td>0.025543</td><td>0.072283</td><td>0.093478</td><td>0.042935</td><td>0.041848</td><td>0.039674</td><td>0.009783</td><td>0.022826</td><td>0.028261</td><td>0.017935</td><td>0.001087</td><td>0.007609</td><td>0.017935</td><td>0.035326</td><td>0.011413</td><td>0.014674</td><td>0.00163</td><td>0.0</td><td>0.003804</td><td>0.005435</td><td>0.011413</td><td>0.002717</td><td>0.010326</td><td>0.009239</td><td>0.003804</td><td>0.002717</td><td>0.00163</td><td>0.0</td><td>0.0</td><td>0.000543</td><td>0.001087</td><td>0.0</td><td>0.0</td><td>0.000543</td><td>0.0</td></tr><tr><td>&quot;acad_06.txt&quot;</td><td>0.285557</td><td>0.069092</td><td>0.036384</td><td>0.039324</td><td>0.154355</td><td>0.037119</td><td>0.026461</td><td>0.048144</td><td>0.030871</td><td>0.038956</td><td>0.019846</td><td>0.020213</td><td>0.01176</td><td>0.015068</td><td>0.020213</td><td>0.014333</td><td>0.023888</td><td>0.011025</td><td>0.00588</td><td>0.008453</td><td>0.00588</td><td>0.002573</td><td>0.008453</td><td>0.006983</td><td>0.011025</td><td>0.004043</td><td>0.005145</td><td>0.001838</td><td>0.002573</td><td>0.010658</td><td>0.005145</td><td>0.0</td><td>0.008453</td><td>0.009923</td><td>0.0</td><td>0.000368</td><td>0.0</td></tr><tr><td>&quot;acad_07.txt&quot;</td><td>0.317905</td><td>0.077466</td><td>0.031183</td><td>0.048049</td><td>0.109041</td><td>0.055893</td><td>0.05707</td><td>0.024711</td><td>0.030006</td><td>0.026868</td><td>0.016474</td><td>0.019808</td><td>0.009217</td><td>0.016082</td><td>0.024122</td><td>0.011963</td><td>0.020396</td><td>0.017258</td><td>0.004511</td><td>0.006864</td><td>0.008825</td><td>0.002157</td><td>0.016866</td><td>0.00706</td><td>0.01059</td><td>0.005491</td><td>0.004903</td><td>0.002746</td><td>0.004315</td><td>0.004903</td><td>0.001177</td><td>0.000784</td><td>0.00255</td><td>0.000392</td><td>0.001569</td><td>0.000392</td><td>0.000392</td></tr><tr><td>&quot;acad_08.txt&quot;</td><td>0.317391</td><td>0.065217</td><td>0.084783</td><td>0.052174</td><td>0.029348</td><td>0.03913</td><td>0.021739</td><td>0.03587</td><td>0.070652</td><td>0.022826</td><td>0.028261</td><td>0.036957</td><td>0.040217</td><td>0.01087</td><td>0.032609</td><td>0.023913</td><td>0.007609</td><td>0.019565</td><td>0.004348</td><td>0.002174</td><td>0.004348</td><td>0.005435</td><td>0.017391</td><td>0.006522</td><td>0.003261</td><td>0.0</td><td>0.007609</td><td>0.002174</td><td>0.001087</td><td>0.003261</td><td>0.003261</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td></tr><tr><td>&quot;acad_09.txt&quot;</td><td>0.315558</td><td>0.028865</td><td>0.176125</td><td>0.083659</td><td>0.048924</td><td>0.028865</td><td>0.009785</td><td>0.062622</td><td>0.034736</td><td>0.017123</td><td>0.013209</td><td>0.020059</td><td>0.022505</td><td>0.022994</td><td>0.003425</td><td>0.003425</td><td>0.005871</td><td>0.00636</td><td>0.009295</td><td>0.035225</td><td>0.003425</td><td>0.001468</td><td>0.004403</td><td>0.010274</td><td>0.008806</td><td>0.000489</td><td>0.003914</td><td>0.001468</td><td>0.003425</td><td>0.001468</td><td>0.001468</td><td>0.0</td><td>0.005382</td><td>0.001957</td><td>0.002446</td><td>0.0</td><td>0.000978</td></tr><tr><td>&quot;acad_10.txt&quot;</td><td>0.388822</td><td>0.093014</td><td>0.096407</td><td>0.063673</td><td>0.04511</td><td>0.047505</td><td>0.015768</td><td>0.022156</td><td>0.023752</td><td>0.021158</td><td>0.015968</td><td>0.025349</td><td>0.010778</td><td>0.012575</td><td>0.014172</td><td>0.004391</td><td>0.008982</td><td>0.004591</td><td>0.007784</td><td>0.011377</td><td>0.017565</td><td>0.006188</td><td>0.005589</td><td>0.00998</td><td>0.002994</td><td>0.001796</td><td>0.001996</td><td>0.007186</td><td>0.002595</td><td>0.002994</td><td>0.003792</td><td>0.002196</td><td>0.0002</td><td>0.000798</td><td>0.000798</td><td>0.0</td><td>0.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 38)\n",
       "┌───────────┬──────────┬───────────┬───────────┬───┬───────────┬───────────┬───────────┬───────────┐\n",
       "│ doc_id    ┆ Untagged ┆ AcademicT ┆ Character ┆ … ┆ Informati ┆ Uncertain ┆ Confidenc ┆ CitationH │\n",
       "│ ---       ┆ ---      ┆ erms      ┆ ---       ┆   ┆ onChangeN ┆ ty        ┆ eLow      ┆ edged     │\n",
       "│ str       ┆ f64      ┆ ---       ┆ f64       ┆   ┆ egative   ┆ ---       ┆ ---       ┆ ---       │\n",
       "│           ┆          ┆ f64       ┆           ┆   ┆ ---       ┆ f64       ┆ f64       ┆ f64       │\n",
       "│           ┆          ┆           ┆           ┆   ┆ f64       ┆           ┆           ┆           │\n",
       "╞═══════════╪══════════╪═══════════╪═══════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n",
       "│ acad_01.t ┆ 0.378947 ┆ 0.148538  ┆ 0.017544  ┆ … ┆ 0.0       ┆ 0.0       ┆ 0.0       ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_02.t ┆ 0.325761 ┆ 0.109301  ┆ 0.033862  ┆ … ┆ 0.000857  ┆ 0.000429  ┆ 0.000429  ┆ 0.000429  │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_03.t ┆ 0.316695 ┆ 0.111744  ┆ 0.061565  ┆ … ┆ 0.004237  ┆ 0.001192  ┆ 0.000132  ┆ 0.000397  │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_04.t ┆ 0.31637  ┆ 0.061069  ┆ 0.023749  ┆ … ┆ 0.000848  ┆ 0.0       ┆ 0.0       ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_05.t ┆ 0.353804 ┆ 0.108696  ┆ 0.025543  ┆ … ┆ 0.0       ┆ 0.0       ┆ 0.000543  ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_06.t ┆ 0.285557 ┆ 0.069092  ┆ 0.036384  ┆ … ┆ 0.009923  ┆ 0.0       ┆ 0.000368  ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_07.t ┆ 0.317905 ┆ 0.077466  ┆ 0.031183  ┆ … ┆ 0.000392  ┆ 0.001569  ┆ 0.000392  ┆ 0.000392  │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_08.t ┆ 0.317391 ┆ 0.065217  ┆ 0.084783  ┆ … ┆ 0.0       ┆ 0.0       ┆ 0.0       ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_09.t ┆ 0.315558 ┆ 0.028865  ┆ 0.176125  ┆ … ┆ 0.001957  ┆ 0.002446  ┆ 0.0       ┆ 0.000978  │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_10.t ┆ 0.388822 ┆ 0.093014  ┆ 0.096407  ┆ … ┆ 0.000798  ┆ 0.000798  ┆ 0.0       ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "└───────────┴──────────┴───────────┴───────────┴───┴───────────┴───────────┴───────────┴───────────┘"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "norm_tm = ds.dtm_weight(tm, scheme='prop')\n",
    "norm_tm.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "f9424743-714b-4b99-8cd9-8525005ce77e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 38)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>doc_id</th><th>Untagged</th><th>AcademicTerms</th><th>Character</th><th>Narrative</th><th>Description</th><th>InformationExposition</th><th>InformationTopics</th><th>Negative</th><th>Positive</th><th>MetadiscourseCohesive</th><th>Reasoning</th><th>ForceStressed</th><th>PublicTerms</th><th>Strategic</th><th>InformationStates</th><th>InformationChange</th><th>ConfidenceHedged</th><th>InformationReportVerbs</th><th>Citation</th><th>InformationPlace</th><th>Interactive</th><th>Inquiry</th><th>Future</th><th>ConfidenceHigh</th><th>Contingent</th><th>AcademicWritingMoves</th><th>Facilitate</th><th>MetadiscourseInteractive</th><th>Updates</th><th>InformationChangePositive</th><th>CitationAuthority</th><th>FirstPerson</th><th>Responsibility</th><th>InformationChangeNegative</th><th>Uncertainty</th><th>ConfidenceLow</th><th>CitationHedged</th></tr><tr><td>str</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;acad_01.txt&quot;</td><td>0.258933</td><td>0.101495</td><td>0.011988</td><td>0.052746</td><td>0.055942</td><td>0.045553</td><td>0.01216</td><td>0.007992</td><td>0.007193</td><td>0.00959</td><td>0.020779</td><td>0.005594</td><td>0.003197</td><td>0.007992</td><td>0.007403</td><td>0.007992</td><td>0.011988</td><td>0.013586</td><td>0.0</td><td>0.0</td><td>0.002432</td><td>0.014593</td><td>0.002504</td><td>0.002398</td><td>0.0</td><td>0.013357</td><td>0.000811</td><td>0.002398</td><td>0.0</td><td>0.000874</td><td>0.001834</td><td>0.0</td><td>0.001964</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td></tr><tr><td>&quot;acad_02.txt&quot;</td><td>0.222591</td><td>0.074685</td><td>0.023138</td><td>0.038953</td><td>0.03866</td><td>0.045983</td><td>0.021986</td><td>0.019623</td><td>0.01933</td><td>0.02841</td><td>0.014937</td><td>0.015816</td><td>0.005272</td><td>0.007029</td><td>0.009948</td><td>0.011715</td><td>0.017573</td><td>0.01113</td><td>0.003843</td><td>0.002928</td><td>0.006536</td><td>0.002377</td><td>0.006119</td><td>0.005858</td><td>0.011455</td><td>0.00153</td><td>0.00208</td><td>0.000879</td><td>0.002412</td><td>0.008327</td><td>0.001008</td><td>0.003558</td><td>0.0</td><td>0.00092</td><td>0.000395</td><td>0.000607</td><td>0.000734</td></tr><tr><td>&quot;acad_03.txt&quot;</td><td>0.216396</td><td>0.076354</td><td>0.042067</td><td>0.038177</td><td>0.039353</td><td>0.03872</td><td>0.022025</td><td>0.018184</td><td>0.014475</td><td>0.012846</td><td>0.014475</td><td>0.011399</td><td>0.004704</td><td>0.007056</td><td>0.011546</td><td>0.011761</td><td>0.012394</td><td>0.005157</td><td>0.041056</td><td>0.004925</td><td>0.003579</td><td>0.007525</td><td>0.003969</td><td>0.002714</td><td>0.004004</td><td>0.00189</td><td>0.00257</td><td>0.002804</td><td>0.001955</td><td>0.00465</td><td>0.002388</td><td>0.005129</td><td>0.000334</td><td>0.004544</td><td>0.001099</td><td>0.000188</td><td>0.00068</td></tr><tr><td>&quot;acad_04.txt&quot;</td><td>0.216174</td><td>0.041728</td><td>0.016228</td><td>0.037091</td><td>0.093308</td><td>0.042307</td><td>0.017049</td><td>0.017966</td><td>0.024341</td><td>0.022603</td><td>0.020284</td><td>0.009852</td><td>0.01275</td><td>0.020284</td><td>0.007158</td><td>0.006955</td><td>0.011012</td><td>0.01333</td><td>0.001901</td><td>0.005795</td><td>0.004115</td><td>0.003527</td><td>0.006659</td><td>0.002318</td><td>0.003579</td><td>0.01453</td><td>0.007055</td><td>0.00058</td><td>0.000597</td><td>0.001268</td><td>0.00133</td><td>0.000782</td><td>0.001425</td><td>0.00091</td><td>0.0</td><td>0.0</td><td>0.0</td></tr><tr><td>&quot;acad_05.txt&quot;</td><td>0.241753</td><td>0.074271</td><td>0.017454</td><td>0.04939</td><td>0.063873</td><td>0.029337</td><td>0.029007</td><td>0.027109</td><td>0.006684</td><td>0.015597</td><td>0.019311</td><td>0.012255</td><td>0.000743</td><td>0.005199</td><td>0.012614</td><td>0.024138</td><td>0.007798</td><td>0.010027</td><td>0.001218</td><td>0.0</td><td>0.002637</td><td>0.003767</td><td>0.008146</td><td>0.001857</td><td>0.007262</td><td>0.006595</td><td>0.002637</td><td>0.001857</td><td>0.001147</td><td>0.0</td><td>0.0</td><td>0.000501</td><td>0.000913</td><td>0.0</td><td>0.0</td><td>0.00077</td><td>0.0</td></tr><tr><td>&quot;acad_06.txt&quot;</td><td>0.195119</td><td>0.04721</td><td>0.024861</td><td>0.02687</td><td>0.10547</td><td>0.025363</td><td>0.018341</td><td>0.032897</td><td>0.021094</td><td>0.026619</td><td>0.01356</td><td>0.013812</td><td>0.008036</td><td>0.010296</td><td>0.014216</td><td>0.009794</td><td>0.016323</td><td>0.007534</td><td>0.004394</td><td>0.006417</td><td>0.004076</td><td>0.001783</td><td>0.006033</td><td>0.004771</td><td>0.007754</td><td>0.002885</td><td>0.003566</td><td>0.001256</td><td>0.001809</td><td>0.007964</td><td>0.004034</td><td>0.0</td><td>0.007098</td><td>0.010644</td><td>0.0</td><td>0.000521</td><td>0.0</td></tr><tr><td>&quot;acad_07.txt&quot;</td><td>0.217223</td><td>0.052932</td><td>0.021307</td><td>0.032831</td><td>0.074507</td><td>0.038192</td><td>0.039558</td><td>0.016885</td><td>0.020503</td><td>0.018359</td><td>0.011256</td><td>0.013535</td><td>0.006298</td><td>0.010988</td><td>0.016965</td><td>0.008174</td><td>0.013937</td><td>0.011792</td><td>0.00337</td><td>0.005211</td><td>0.006117</td><td>0.001495</td><td>0.012038</td><td>0.004824</td><td>0.007448</td><td>0.003919</td><td>0.003398</td><td>0.001876</td><td>0.003034</td><td>0.003664</td><td>0.000923</td><td>0.000724</td><td>0.002141</td><td>0.000421</td><td>0.001447</td><td>0.000556</td><td>0.000672</td></tr><tr><td>&quot;acad_08.txt&quot;</td><td>0.216872</td><td>0.044563</td><td>0.057932</td><td>0.03565</td><td>0.020053</td><td>0.026738</td><td>0.015068</td><td>0.024509</td><td>0.048276</td><td>0.015597</td><td>0.019311</td><td>0.025252</td><td>0.02748</td><td>0.007427</td><td>0.022934</td><td>0.01634</td><td>0.005199</td><td>0.013369</td><td>0.003249</td><td>0.00165</td><td>0.003014</td><td>0.003767</td><td>0.012413</td><td>0.004456</td><td>0.002293</td><td>0.0</td><td>0.005274</td><td>0.001485</td><td>0.000764</td><td>0.002437</td><td>0.002557</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td><td>0.0</td></tr><tr><td>&quot;acad_09.txt&quot;</td><td>0.215619</td><td>0.019723</td><td>0.120345</td><td>0.057164</td><td>0.033429</td><td>0.019723</td><td>0.006782</td><td>0.04279</td><td>0.023735</td><td>0.0117</td><td>0.009026</td><td>0.013706</td><td>0.015377</td><td>0.015712</td><td>0.002409</td><td>0.00234</td><td>0.004012</td><td>0.004346</td><td>0.006946</td><td>0.02674</td><td>0.002374</td><td>0.001017</td><td>0.003143</td><td>0.00702</td><td>0.006193</td><td>0.000349</td><td>0.002713</td><td>0.001003</td><td>0.002409</td><td>0.001097</td><td>0.001151</td><td>0.0</td><td>0.004519</td><td>0.002099</td><td>0.002256</td><td>0.0</td><td>0.001676</td></tr><tr><td>&quot;acad_10.txt&quot;</td><td>0.26568</td><td>0.063556</td><td>0.065875</td><td>0.043507</td><td>0.030823</td><td>0.03246</td><td>0.01093</td><td>0.015139</td><td>0.01623</td><td>0.014457</td><td>0.010911</td><td>0.017321</td><td>0.007365</td><td>0.008592</td><td>0.009967</td><td>0.003</td><td>0.006137</td><td>0.003137</td><td>0.005817</td><td>0.008637</td><td>0.012175</td><td>0.004289</td><td>0.003989</td><td>0.006819</td><td>0.002106</td><td>0.001282</td><td>0.001384</td><td>0.00491</td><td>0.001825</td><td>0.002237</td><td>0.002974</td><td>0.002025</td><td>0.000168</td><td>0.000856</td><td>0.000736</td><td>0.0</td><td>0.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 38)\n",
       "┌───────────┬──────────┬───────────┬───────────┬───┬───────────┬───────────┬───────────┬───────────┐\n",
       "│ doc_id    ┆ Untagged ┆ AcademicT ┆ Character ┆ … ┆ Informati ┆ Uncertain ┆ Confidenc ┆ CitationH │\n",
       "│ ---       ┆ ---      ┆ erms      ┆ ---       ┆   ┆ onChangeN ┆ ty        ┆ eLow      ┆ edged     │\n",
       "│ str       ┆ f64      ┆ ---       ┆ f64       ┆   ┆ egative   ┆ ---       ┆ ---       ┆ ---       │\n",
       "│           ┆          ┆ f64       ┆           ┆   ┆ ---       ┆ f64       ┆ f64       ┆ f64       │\n",
       "│           ┆          ┆           ┆           ┆   ┆ f64       ┆           ┆           ┆           │\n",
       "╞═══════════╪══════════╪═══════════╪═══════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n",
       "│ acad_01.t ┆ 0.258933 ┆ 0.101495  ┆ 0.011988  ┆ … ┆ 0.0       ┆ 0.0       ┆ 0.0       ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_02.t ┆ 0.222591 ┆ 0.074685  ┆ 0.023138  ┆ … ┆ 0.00092   ┆ 0.000395  ┆ 0.000607  ┆ 0.000734  │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_03.t ┆ 0.216396 ┆ 0.076354  ┆ 0.042067  ┆ … ┆ 0.004544  ┆ 0.001099  ┆ 0.000188  ┆ 0.00068   │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_04.t ┆ 0.216174 ┆ 0.041728  ┆ 0.016228  ┆ … ┆ 0.00091   ┆ 0.0       ┆ 0.0       ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_05.t ┆ 0.241753 ┆ 0.074271  ┆ 0.017454  ┆ … ┆ 0.0       ┆ 0.0       ┆ 0.00077   ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_06.t ┆ 0.195119 ┆ 0.04721   ┆ 0.024861  ┆ … ┆ 0.010644  ┆ 0.0       ┆ 0.000521  ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_07.t ┆ 0.217223 ┆ 0.052932  ┆ 0.021307  ┆ … ┆ 0.000421  ┆ 0.001447  ┆ 0.000556  ┆ 0.000672  │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_08.t ┆ 0.216872 ┆ 0.044563  ┆ 0.057932  ┆ … ┆ 0.0       ┆ 0.0       ┆ 0.0       ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_09.t ┆ 0.215619 ┆ 0.019723  ┆ 0.120345  ┆ … ┆ 0.002099  ┆ 0.002256  ┆ 0.0       ┆ 0.001676  │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "│ acad_10.t ┆ 0.26568  ┆ 0.063556  ┆ 0.065875  ┆ … ┆ 0.000856  ┆ 0.000736  ┆ 0.0       ┆ 0.0       │\n",
       "│ xt        ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
       "└───────────┴──────────┴───────────┴───────────┴───┴───────────┴───────────┴───────────┴───────────┘"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tfidf_tm = ds.dtm_weight(tm, scheme='tfidf')\n",
    "tfidf_tm.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6856b77b",
   "metadata": {},
   "source": [
    "## KWIC tables\n",
    "\n",
    "There is also a function for generating Key Word in Context (KWIC) tables. For display purposes the `kwic_center_node` function trims the context columns to 75 characters maximum.\n",
    "\n",
    "The function requires a **corpus** of the type generated by the `Corpus.from_dictionary` function. A node word needs to be set and there is the option to ignore the case of the node word.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note: Other KWIC options**\n",
    "\n",
    "The **tmtoolkit** package has [its own KWIC functions](https://tmtoolkit.readthedocs.io/en/latest/preprocessing.html#Keywords-in-context-(KWIC)-and-general-filtering-methods). The only difference is that this function produced a table with the node word in a center column with context columns to the left and right. The **tmtoolkit** functions produce tables with a single column that includes the node word.\n",
    "  \n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "59d7e3af",
   "metadata": {},
   "outputs": [],
   "source": [
    "kcn = ds.kwic_center_node(ds_tokens, 'data', ignore_case=True, search_type='fixed')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "51c9dd2a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 4)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Doc ID</th><th>Pre-Node</th><th>Node</th><th>Post-Node</th></tr><tr><td>str</td><td>str</td><td>str</td><td>str</td></tr></thead><tbody><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;and the results were recorded …</td><td>&quot;data &quot;</td><td>&quot;chart. This was repeated for a…</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;the surface. Table 1 shows the…</td><td>&quot;data &quot;</td><td>&quot;chart for the number of bubble…</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;of sodium bicarbonate was calc…</td><td>&quot;data &quot;</td><td>&quot;can be seen below in Table 2&quot;</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;bicarbonate increased. As show…</td><td>&quot;data &quot;</td><td>&quot;in Tables 1 and 2 in the &quot;</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;is 10.8 bubbles. Based on the &quot;</td><td>&quot;data &quot;</td><td>&quot;shown in Table 1, it is &quot;</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 4)\n",
       "┌─────────────┬─────────────────────────────────┬───────┬─────────────────────────────────┐\n",
       "│ Doc ID      ┆ Pre-Node                        ┆ Node  ┆ Post-Node                       │\n",
       "│ ---         ┆ ---                             ┆ ---   ┆ ---                             │\n",
       "│ str         ┆ str                             ┆ str   ┆ str                             │\n",
       "╞═════════════╪═════════════════════════════════╪═══════╪═════════════════════════════════╡\n",
       "│ acad_01.txt ┆ and the results were recorded … ┆ data  ┆ chart. This was repeated for a… │\n",
       "│ acad_01.txt ┆ the surface. Table 1 shows the… ┆ data  ┆ chart for the number of bubble… │\n",
       "│ acad_01.txt ┆ of sodium bicarbonate was calc… ┆ data  ┆ can be seen below in Table 2    │\n",
       "│ acad_01.txt ┆ bicarbonate increased. As show… ┆ data  ┆ in Tables 1 and 2 in the        │\n",
       "│ acad_01.txt ┆ is 10.8 bubbles. Based on the   ┆ data  ┆ shown in Table 1, it is         │\n",
       "└─────────────┴─────────────────────────────────┴───────┴─────────────────────────────────┘"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kcn.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc30d78d",
   "metadata": {},
   "source": [
    "There is also an option allowing for that contain character sequences at the beginning or end of tokens by changing the `search_type` argument:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "42a7fd3f",
   "metadata": {},
   "outputs": [],
   "source": [
    "kwc = ds.kwic_center_node(ds_tokens, 'tion', ignore_case=True, search_type='ends_with')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "a3521576",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 4)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Doc ID</th><th>Pre-Node</th><th>Node</th><th>Post-Node</th></tr><tr><td>str</td><td>str</td><td>str</td><td>str</td></tr></thead><tbody><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;photosynthesis. This process o…</td><td>&quot;fixation &quot;</td><td>&quot;of carbon dioxide in the prese…</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;The end result of photosynthes…</td><td>&quot;production &quot;</td><td>&quot;of organic materials, such as …</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;factor to be tested would be t…</td><td>&quot;concentration &quot;</td><td>&quot;of carbon dioxide initially pr…</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;was generated: An increase in …</td><td>&quot;concentration &quot;</td><td>&quot;of carbon dioxide initially pr…</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;bubbles produced by the plants…</td><td>&quot;attention &quot;</td><td>&quot;was paid to cutting the stem o…</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;concentrations were accomplish…</td><td>&quot;solution &quot;</td><td>&quot;of 0.2% sodium bicarbonate wit…</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;number of bubbles observed at …</td><td>&quot;concentration &quot;</td><td>&quot;of sodium bicarbonate in the f…</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;number of oxygen bubbles obser…</td><td>&quot;concentration &quot;</td><td>&quot;of sodium bicarbonate was calc…</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;of photosynthesis steadily inc…</td><td>&quot;concentration &quot;</td><td>&quot;of sodium bicarbonate increase…</td></tr><tr><td>&quot;acad_01.txt&quot;</td><td>&quot;Tables 1 and 2 in the Results &quot;</td><td>&quot;section&quot;</td><td>&quot;, the number of oxygen bubbles…</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 4)\n",
       "┌─────────────┬─────────────────────────────────┬────────────────┬─────────────────────────────────┐\n",
       "│ Doc ID      ┆ Pre-Node                        ┆ Node           ┆ Post-Node                       │\n",
       "│ ---         ┆ ---                             ┆ ---            ┆ ---                             │\n",
       "│ str         ┆ str                             ┆ str            ┆ str                             │\n",
       "╞═════════════╪═════════════════════════════════╪════════════════╪═════════════════════════════════╡\n",
       "│ acad_01.txt ┆ photosynthesis. This process o… ┆ fixation       ┆ of carbon dioxide in the prese… │\n",
       "│ acad_01.txt ┆ The end result of photosynthes… ┆ production     ┆ of organic materials, such as … │\n",
       "│ acad_01.txt ┆ factor to be tested would be t… ┆ concentration  ┆ of carbon dioxide initially pr… │\n",
       "│ acad_01.txt ┆ was generated: An increase in … ┆ concentration  ┆ of carbon dioxide initially pr… │\n",
       "│ acad_01.txt ┆ bubbles produced by the plants… ┆ attention      ┆ was paid to cutting the stem o… │\n",
       "│ acad_01.txt ┆ concentrations were accomplish… ┆ solution       ┆ of 0.2% sodium bicarbonate wit… │\n",
       "│ acad_01.txt ┆ number of bubbles observed at … ┆ concentration  ┆ of sodium bicarbonate in the f… │\n",
       "│ acad_01.txt ┆ number of oxygen bubbles obser… ┆ concentration  ┆ of sodium bicarbonate was calc… │\n",
       "│ acad_01.txt ┆ of photosynthesis steadily inc… ┆ concentration  ┆ of sodium bicarbonate increase… │\n",
       "│ acad_01.txt ┆ Tables 1 and 2 in the Results   ┆ section        ┆ , the number of oxygen bubbles… │\n",
       "└─────────────┴─────────────────────────────────┴────────────────┴─────────────────────────────────┘"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kwc.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e8d4198",
   "metadata": {},
   "source": [
    "## Keyword tables\n",
    "\n",
    "[Keywords](https://eprints.lancs.ac.uk/id/eprint/140803/1/Rayson_2019_CorpusAnalysisofKeyWords_Submitted.pdf) are common method for profiling corpora by statstically comparing token frequencies in one corpus (a target corpus) to those in another (a reference corpus).\n",
    "\n",
    "To generate a keyword list, we first need to process our reference corpus, in this case a small corpus of news articles.\n",
    "\n",
    "<div class=\"alert alert-warning\">\n",
    "    \n",
    "**Warning: Preparing frequency tables**\n",
    "\n",
    "Be sure to process target and reference corpora in precisely the same way prior to comparison.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "c90b74a9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.2 s, sys: 231 ms, total: 2.43 s\n",
      "Wall time: 8.5 s\n"
     ]
    }
   ],
   "source": [
    "corp_ref = ds.corpus_from_folder('data/ref_corpus')\n",
    "ref_tokens = ds.docuscope_parse(corp_ref, nlp_model=nlp, n_process=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8bbb1738",
   "metadata": {},
   "source": [
    "Next, we will use `frequency_table` to generate 2 tables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "f6d1099d",
   "metadata": {},
   "outputs": [],
   "source": [
    "wc_target = ds.frequency_table(ds_tokens)\n",
    "wc_ref = ds.frequency_table(ref_tokens)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adda2de2",
   "metadata": {},
   "source": [
    "To generate a table of key words, we will use `keyness_table`, which takes both our target and reference frequency tables. An arguement can also be set for using the Yates correction by setting the `correct` argument to 'True'. Here will leave the default, which is for no correction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "35d5de8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "kw = ds.keyness_table(wc_target, wc_ref)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e9ef3fd",
   "metadata": {},
   "source": [
    "The table returns the frequency data for both corpora, with a column for [log-likehood](https://ucrel.lancs.ac.uk/llwizard.html) (the test of significance), as well as [Log Ratio](http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/) (an effect size measure), and the *p*-value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "f62fbb3d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 11)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>LL</th><th>LR</th><th>PV</th><th>RF</th><th>RF_Ref</th><th>AF</th><th>AF_Ref</th><th>Range</th><th>Range_Ref</th></tr><tr><td>str</td><td>str</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>u32</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;of&quot;</td><td>&quot;IO&quot;</td><td>217.586864</td><td>0.804786</td><td>3.0392e-49</td><td>38149.827516</td><td>21838.753516</td><td>5065</td><td>691</td><td>100.0</td><td>96.0</td></tr><tr><td>&quot;the&quot;</td><td>&quot;AT&quot;</td><td>94.076679</td><td>0.349927</td><td>3.0353e-22</td><td>72382.989621</td><td>56793.400967</td><td>9610</td><td>1797</td><td>100.0</td><td>100.0</td></tr><tr><td>&quot;et al&quot;</td><td>&quot;RA&quot;</td><td>85.930266</td><td>6.582033</td><td>1.8639e-20</td><td>1513.941822</td><td>0.0</td><td>201</td><td>0</td><td>12.0</td><td>0.0</td></tr><tr><td>&quot;is&quot;</td><td>&quot;VBZ&quot;</td><td>83.80889</td><td>0.849238</td><td>5.4499e-20</td><td>13437.17518</td><td>7458.677033</td><td>1784</td><td>236</td><td>98.0</td><td>98.0</td></tr><tr><td>&quot;faculty&quot;</td><td>&quot;NN1&quot;</td><td>70.356482</td><td>5.47014</td><td>4.9500e-17</td><td>1400.961089</td><td>31.604564</td><td>186</td><td>1</td><td>4.0</td><td>2.0</td></tr><tr><td>&quot;these&quot;</td><td>&quot;DD2&quot;</td><td>67.179713</td><td>2.23679</td><td>2.4785e-16</td><td>2681.409397</td><td>568.882147</td><td>356</td><td>18</td><td>96.0</td><td>32.0</td></tr><tr><td>&quot;this&quot;</td><td>&quot;DD1&quot;</td><td>66.791235</td><td>1.042692</td><td>3.0184e-16</td><td>7682.689845</td><td>3729.338516</td><td>1020</td><td>118</td><td>100.0</td><td>84.0</td></tr><tr><td>&quot;students&quot;</td><td>&quot;NN2&quot;</td><td>49.021193</td><td>4.15015</td><td>2.5321e-12</td><td>1122.275281</td><td>63.209127</td><td>149</td><td>2</td><td>20.0</td><td>4.0</td></tr><tr><td>&quot;education&quot;</td><td>&quot;NN1&quot;</td><td>48.779503</td><td>4.997071</td><td>2.8642e-12</td><td>1009.294548</td><td>31.604564</td><td>134</td><td>1</td><td>14.0</td><td>2.0</td></tr><tr><td>&quot;study&quot;</td><td>&quot;NN1&quot;</td><td>48.152184</td><td>3.348834</td><td>3.9439e-12</td><td>1287.980356</td><td>126.418255</td><td>171</td><td>4</td><td>48.0</td><td>2.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 11)\n",
       "┌───────────┬─────┬────────────┬──────────┬───┬──────┬────────┬───────┬───────────┐\n",
       "│ Token     ┆ Tag ┆ LL         ┆ LR       ┆ … ┆ AF   ┆ AF_Ref ┆ Range ┆ Range_Ref │\n",
       "│ ---       ┆ --- ┆ ---        ┆ ---      ┆   ┆ ---  ┆ ---    ┆ ---   ┆ ---       │\n",
       "│ str       ┆ str ┆ f64        ┆ f64      ┆   ┆ u32  ┆ u32    ┆ f64   ┆ f64       │\n",
       "╞═══════════╪═════╪════════════╪══════════╪═══╪══════╪════════╪═══════╪═══════════╡\n",
       "│ of        ┆ IO  ┆ 217.586864 ┆ 0.804786 ┆ … ┆ 5065 ┆ 691    ┆ 100.0 ┆ 96.0      │\n",
       "│ the       ┆ AT  ┆ 94.076679  ┆ 0.349927 ┆ … ┆ 9610 ┆ 1797   ┆ 100.0 ┆ 100.0     │\n",
       "│ et al     ┆ RA  ┆ 85.930266  ┆ 6.582033 ┆ … ┆ 201  ┆ 0      ┆ 12.0  ┆ 0.0       │\n",
       "│ is        ┆ VBZ ┆ 83.80889   ┆ 0.849238 ┆ … ┆ 1784 ┆ 236    ┆ 98.0  ┆ 98.0      │\n",
       "│ faculty   ┆ NN1 ┆ 70.356482  ┆ 5.47014  ┆ … ┆ 186  ┆ 1      ┆ 4.0   ┆ 2.0       │\n",
       "│ these     ┆ DD2 ┆ 67.179713  ┆ 2.23679  ┆ … ┆ 356  ┆ 18     ┆ 96.0  ┆ 32.0      │\n",
       "│ this      ┆ DD1 ┆ 66.791235  ┆ 1.042692 ┆ … ┆ 1020 ┆ 118    ┆ 100.0 ┆ 84.0      │\n",
       "│ students  ┆ NN2 ┆ 49.021193  ┆ 4.15015  ┆ … ┆ 149  ┆ 2      ┆ 20.0  ┆ 4.0       │\n",
       "│ education ┆ NN1 ┆ 48.779503  ┆ 4.997071 ┆ … ┆ 134  ┆ 1      ┆ 14.0  ┆ 2.0       │\n",
       "│ study     ┆ NN1 ┆ 48.152184  ┆ 3.348834 ┆ … ┆ 171  ┆ 4      ┆ 48.0  ┆ 2.0       │\n",
       "└───────────┴─────┴────────────┴──────────┴───┴──────┴────────┴───────┴───────────┘"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kw.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ebec5438",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "    \n",
    "**Updates: Threshold specification**\n",
    "\n",
    "As of v0.3.0 the `keyness_table` function allows users to set a significance threshold. This is because when comparing even moderate-sized corpora, a keyness table can become massive. Thus, the function now only returns those values that reach the specified threshold, show only tokens whose frequency is significantly higher in the target corpus than the reference corpus. In order to see the revers (those more significantly more frequent in the reference than target) the order of the frequency tables in the function need to be swapped.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2737307d",
   "metadata": {},
   "source": [
    "The default is 'threshold=0.01', which can be seen by looking at the tail of the table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "078b1b6f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 11)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>LL</th><th>LR</th><th>PV</th><th>RF</th><th>RF_Ref</th><th>AF</th><th>AF_Ref</th><th>Range</th><th>Range_Ref</th></tr><tr><td>str</td><td>str</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>u32</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;rail&quot;</td><td>&quot;NN1&quot;</td><td>6.84022</td><td>2.930981</td><td>0.008913</td><td>120.512782</td><td>0.0</td><td>16</td><td>0</td><td>2.0</td><td>0.0</td></tr><tr><td>&quot;recognize&quot;</td><td>&quot;VVI&quot;</td><td>6.84022</td><td>2.930981</td><td>0.008913</td><td>120.512782</td><td>0.0</td><td>16</td><td>0</td><td>18.0</td><td>0.0</td></tr><tr><td>&quot;relation&quot;</td><td>&quot;NN1&quot;</td><td>6.84022</td><td>2.930981</td><td>0.008913</td><td>120.512782</td><td>0.0</td><td>16</td><td>0</td><td>10.0</td><td>0.0</td></tr><tr><td>&quot;replacement&quot;</td><td>&quot;NN1&quot;</td><td>6.84022</td><td>2.930981</td><td>0.008913</td><td>120.512782</td><td>0.0</td><td>16</td><td>0</td><td>6.0</td><td>0.0</td></tr><tr><td>&quot;slope&quot;</td><td>&quot;NN1&quot;</td><td>6.84022</td><td>2.930981</td><td>0.008913</td><td>120.512782</td><td>0.0</td><td>16</td><td>0</td><td>4.0</td><td>0.0</td></tr><tr><td>&quot;suggested&quot;</td><td>&quot;VVN&quot;</td><td>6.84022</td><td>2.930981</td><td>0.008913</td><td>120.512782</td><td>0.0</td><td>16</td><td>0</td><td>16.0</td><td>0.0</td></tr><tr><td>&quot;technologies&quot;</td><td>&quot;NN2&quot;</td><td>6.84022</td><td>2.930981</td><td>0.008913</td><td>120.512782</td><td>0.0</td><td>16</td><td>0</td><td>4.0</td><td>0.0</td></tr><tr><td>&quot;wazzan&quot;</td><td>&quot;NP1&quot;</td><td>6.84022</td><td>2.930981</td><td>0.008913</td><td>120.512782</td><td>0.0</td><td>16</td><td>0</td><td>2.0</td><td>0.0</td></tr><tr><td>&quot;welfare&quot;</td><td>&quot;NN1&quot;</td><td>6.84022</td><td>2.930981</td><td>0.008913</td><td>120.512782</td><td>0.0</td><td>16</td><td>0</td><td>10.0</td><td>0.0</td></tr><tr><td>&quot;how&quot;</td><td>&quot;RRQ&quot;</td><td>6.701434</td><td>0.969116</td><td>0.009634</td><td>866.18562</td><td>442.463892</td><td>115</td><td>14</td><td>70.0</td><td>24.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 11)\n",
       "┌──────────────┬─────┬──────────┬──────────┬───┬─────┬────────┬───────┬───────────┐\n",
       "│ Token        ┆ Tag ┆ LL       ┆ LR       ┆ … ┆ AF  ┆ AF_Ref ┆ Range ┆ Range_Ref │\n",
       "│ ---          ┆ --- ┆ ---      ┆ ---      ┆   ┆ --- ┆ ---    ┆ ---   ┆ ---       │\n",
       "│ str          ┆ str ┆ f64      ┆ f64      ┆   ┆ u32 ┆ u32    ┆ f64   ┆ f64       │\n",
       "╞══════════════╪═════╪══════════╪══════════╪═══╪═════╪════════╪═══════╪═══════════╡\n",
       "│ rail         ┆ NN1 ┆ 6.84022  ┆ 2.930981 ┆ … ┆ 16  ┆ 0      ┆ 2.0   ┆ 0.0       │\n",
       "│ recognize    ┆ VVI ┆ 6.84022  ┆ 2.930981 ┆ … ┆ 16  ┆ 0      ┆ 18.0  ┆ 0.0       │\n",
       "│ relation     ┆ NN1 ┆ 6.84022  ┆ 2.930981 ┆ … ┆ 16  ┆ 0      ┆ 10.0  ┆ 0.0       │\n",
       "│ replacement  ┆ NN1 ┆ 6.84022  ┆ 2.930981 ┆ … ┆ 16  ┆ 0      ┆ 6.0   ┆ 0.0       │\n",
       "│ slope        ┆ NN1 ┆ 6.84022  ┆ 2.930981 ┆ … ┆ 16  ┆ 0      ┆ 4.0   ┆ 0.0       │\n",
       "│ suggested    ┆ VVN ┆ 6.84022  ┆ 2.930981 ┆ … ┆ 16  ┆ 0      ┆ 16.0  ┆ 0.0       │\n",
       "│ technologies ┆ NN2 ┆ 6.84022  ┆ 2.930981 ┆ … ┆ 16  ┆ 0      ┆ 4.0   ┆ 0.0       │\n",
       "│ wazzan       ┆ NP1 ┆ 6.84022  ┆ 2.930981 ┆ … ┆ 16  ┆ 0      ┆ 2.0   ┆ 0.0       │\n",
       "│ welfare      ┆ NN1 ┆ 6.84022  ┆ 2.930981 ┆ … ┆ 16  ┆ 0      ┆ 10.0  ┆ 0.0       │\n",
       "│ how          ┆ RRQ ┆ 6.701434 ┆ 0.969116 ┆ … ┆ 115 ┆ 14     ┆ 70.0  ┆ 24.0      │\n",
       "└──────────────┴─────┴──────────┴──────────┴───┴─────┴────────┴───────┴───────────┘"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kw.tail(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6576fb1",
   "metadata": {},
   "source": [
    "Keyness tables can also be generated for counts of either part-of-speech or DocuScope tags. First, we prepare the frequency tables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "id": "7559364d",
   "metadata": {},
   "outputs": [],
   "source": [
    "tag_ref = ds.tags_table(ref_tokens, count_by='pos')\n",
    "tag_tar = ds.tags_table(ds_tokens, count_by='pos')\n",
    "ds_ref = ds.tags_table(ref_tokens, count_by='ds')\n",
    "ds_tar = ds.tags_table(ds_tokens,  count_by='ds')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a11a15b3",
   "metadata": {},
   "source": [
    "We will set the `tags_only` argument to 'True' and we will also emply the Yates correction, setting `correct` to 'True', as well:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "id": "ebeb0adb",
   "metadata": {},
   "outputs": [],
   "source": [
    "kt = ds.keyness_table(tag_tar, tag_ref, tags_only=True, correct=True, threshold=.05)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "id": "42381d27",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 10)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Tag</th><th>LL</th><th>LR</th><th>PV</th><th>RF</th><th>RF_Ref</th><th>AF</th><th>AF_Ref</th><th>Range</th><th>Range_Ref</th></tr><tr><td>str</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>u32</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;JJ&quot;</td><td>258.236798</td><td>0.554966</td><td>4.1577e-58</td><td>8.58051</td><td>5.840523</td><td>11392</td><td>1848</td><td>100.0</td><td>100.0</td></tr><tr><td>&quot;IO&quot;</td><td>217.909342</td><td>0.804786</td><td>2.5848e-49</td><td>3.814983</td><td>2.183875</td><td>5065</td><td>691</td><td>100.0</td><td>96.0</td></tr><tr><td>&quot;NN2&quot;</td><td>107.912423</td><td>0.386003</td><td>2.8092e-25</td><td>6.888812</td><td>5.271641</td><td>9146</td><td>1668</td><td>100.0</td><td>100.0</td></tr><tr><td>&quot;NN1&quot;</td><td>101.543168</td><td>0.223199</td><td>6.9923e-24</td><td>18.099513</td><td>15.505199</td><td>24030</td><td>4906</td><td>100.0</td><td>100.0</td></tr><tr><td>&quot;AT&quot;</td><td>90.876836</td><td>0.340048</td><td>1.5290e-21</td><td>7.324918</td><td>5.786796</td><td>9725</td><td>1831</td><td>100.0</td><td>100.0</td></tr><tr><td>&quot;RR&quot;</td><td>81.123951</td><td>0.508681</td><td>2.1199e-19</td><td>3.134086</td><td>2.202838</td><td>4161</td><td>697</td><td>100.0</td><td>98.0</td></tr><tr><td>&quot;ZZ1&quot;</td><td>67.0445</td><td>2.044044</td><td>2.6545e-16</td><td>0.299776</td><td>0.07269</td><td>398</td><td>23</td><td>54.0</td><td>28.0</td></tr><tr><td>&quot;VVZ&quot;</td><td>62.211092</td><td>0.706523</td><td>3.0855e-15</td><td>1.35125</td><td>0.82804</td><td>1794</td><td>262</td><td>98.0</td><td>92.0</td></tr><tr><td>&quot;RGR&quot;</td><td>57.142521</td><td>2.262496</td><td>4.0535e-14</td><td>0.227468</td><td>0.047407</td><td>302</td><td>15</td><td>86.0</td><td>22.0</td></tr><tr><td>&quot;DD1&quot;</td><td>55.060338</td><td>0.732546</td><td>1.1689e-13</td><td>1.123782</td><td>0.676338</td><td>1492</td><td>214</td><td>100.0</td><td>94.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 10)\n",
       "┌─────┬────────────┬──────────┬────────────┬───┬───────┬────────┬───────┬───────────┐\n",
       "│ Tag ┆ LL         ┆ LR       ┆ PV         ┆ … ┆ AF    ┆ AF_Ref ┆ Range ┆ Range_Ref │\n",
       "│ --- ┆ ---        ┆ ---      ┆ ---        ┆   ┆ ---   ┆ ---    ┆ ---   ┆ ---       │\n",
       "│ str ┆ f64        ┆ f64      ┆ f64        ┆   ┆ u32   ┆ u32    ┆ f64   ┆ f64       │\n",
       "╞═════╪════════════╪══════════╪════════════╪═══╪═══════╪════════╪═══════╪═══════════╡\n",
       "│ JJ  ┆ 258.236798 ┆ 0.554966 ┆ 4.1577e-58 ┆ … ┆ 11392 ┆ 1848   ┆ 100.0 ┆ 100.0     │\n",
       "│ IO  ┆ 217.909342 ┆ 0.804786 ┆ 2.5848e-49 ┆ … ┆ 5065  ┆ 691    ┆ 100.0 ┆ 96.0      │\n",
       "│ NN2 ┆ 107.912423 ┆ 0.386003 ┆ 2.8092e-25 ┆ … ┆ 9146  ┆ 1668   ┆ 100.0 ┆ 100.0     │\n",
       "│ NN1 ┆ 101.543168 ┆ 0.223199 ┆ 6.9923e-24 ┆ … ┆ 24030 ┆ 4906   ┆ 100.0 ┆ 100.0     │\n",
       "│ AT  ┆ 90.876836  ┆ 0.340048 ┆ 1.5290e-21 ┆ … ┆ 9725  ┆ 1831   ┆ 100.0 ┆ 100.0     │\n",
       "│ RR  ┆ 81.123951  ┆ 0.508681 ┆ 2.1199e-19 ┆ … ┆ 4161  ┆ 697    ┆ 100.0 ┆ 98.0      │\n",
       "│ ZZ1 ┆ 67.0445    ┆ 2.044044 ┆ 2.6545e-16 ┆ … ┆ 398   ┆ 23     ┆ 54.0  ┆ 28.0      │\n",
       "│ VVZ ┆ 62.211092  ┆ 0.706523 ┆ 3.0855e-15 ┆ … ┆ 1794  ┆ 262    ┆ 98.0  ┆ 92.0      │\n",
       "│ RGR ┆ 57.142521  ┆ 2.262496 ┆ 4.0535e-14 ┆ … ┆ 302   ┆ 15     ┆ 86.0  ┆ 22.0      │\n",
       "│ DD1 ┆ 55.060338  ┆ 0.732546 ┆ 1.1689e-13 ┆ … ┆ 1492  ┆ 214    ┆ 100.0 ┆ 94.0      │\n",
       "└─────┴────────────┴──────────┴────────────┴───┴───────┴────────┴───────┴───────────┘"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kt.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6aff7a1",
   "metadata": {},
   "source": [
    "We can do the same for the DocuScope frequency tables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "id": "0bf2450a",
   "metadata": {},
   "outputs": [],
   "source": [
    "kds = ds.keyness_table(ds_tar, ds_ref, tags_only=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "id": "f5314f03",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 10)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Tag</th><th>LL</th><th>LR</th><th>PV</th><th>RF</th><th>RF_Ref</th><th>AF</th><th>AF_Ref</th><th>Range</th><th>Range_Ref</th></tr><tr><td>str</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>u32</td><td>u32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>&quot;CitationHedged&quot;</td><td>6.981271</td><td>2.954139</td><td>0.008237</td><td>0.015617</td><td>0.0</td><td>17</td><td>0</td><td>20.0</td><td>0.0</td></tr><tr><td>&quot;AcademicWritingMoves&quot;</td><td>51.654651</td><td>1.311183</td><td>6.6174e-13</td><td>0.530053</td><td>0.213606</td><td>577</td><td>53</td><td>94.0</td><td>52.0</td></tr><tr><td>&quot;AcademicTerms&quot;</td><td>729.47416</td><td>1.205083</td><td>1.1656e-160</td><td>8.492793</td><td>3.683701</td><td>9245</td><td>914</td><td>100.0</td><td>98.0</td></tr><tr><td>&quot;InformationChange&quot;</td><td>101.904145</td><td>1.1768</td><td>5.8274e-24</td><td>1.230054</td><td>0.544092</td><td>1339</td><td>135</td><td>100.0</td><td>80.0</td></tr><tr><td>&quot;MetadiscourseInteractive&quot;</td><td>31.731942</td><td>1.143007</td><td>1.7699e-8</td><td>0.400525</td><td>0.181364</td><td>436</td><td>45</td><td>100.0</td><td>50.0</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 10)\n",
       "┌────────────────────┬────────────┬──────────┬─────────────┬───┬──────┬────────┬───────┬───────────┐\n",
       "│ Tag                ┆ LL         ┆ LR       ┆ PV          ┆ … ┆ AF   ┆ AF_Ref ┆ Range ┆ Range_Ref │\n",
       "│ ---                ┆ ---        ┆ ---      ┆ ---         ┆   ┆ ---  ┆ ---    ┆ ---   ┆ ---       │\n",
       "│ str                ┆ f64        ┆ f64      ┆ f64         ┆   ┆ u32  ┆ u32    ┆ f64   ┆ f64       │\n",
       "╞════════════════════╪════════════╪══════════╪═════════════╪═══╪══════╪════════╪═══════╪═══════════╡\n",
       "│ CitationHedged     ┆ 6.981271   ┆ 2.954139 ┆ 0.008237    ┆ … ┆ 17   ┆ 0      ┆ 20.0  ┆ 0.0       │\n",
       "│ AcademicWritingMov ┆ 51.654651  ┆ 1.311183 ┆ 6.6174e-13  ┆ … ┆ 577  ┆ 53     ┆ 94.0  ┆ 52.0      │\n",
       "│ es                 ┆            ┆          ┆             ┆   ┆      ┆        ┆       ┆           │\n",
       "│ AcademicTerms      ┆ 729.47416  ┆ 1.205083 ┆ 1.1656e-160 ┆ … ┆ 9245 ┆ 914    ┆ 100.0 ┆ 98.0      │\n",
       "│ InformationChange  ┆ 101.904145 ┆ 1.1768   ┆ 5.8274e-24  ┆ … ┆ 1339 ┆ 135    ┆ 100.0 ┆ 80.0      │\n",
       "│ MetadiscourseInter ┆ 31.731942  ┆ 1.143007 ┆ 1.7699e-8   ┆ … ┆ 436  ┆ 45     ┆ 100.0 ┆ 50.0      │\n",
       "│ active             ┆            ┆          ┆             ┆   ┆      ┆        ┆       ┆           │\n",
       "└────────────────────┴────────────┴──────────┴─────────────┴───┴──────┴────────┴───────┴───────────┘"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "kds.sort(\"LR\", descending=True).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b9f6166",
   "metadata": {},
   "source": [
    "## Single document tag highlighting\n",
    "\n",
    "Tags (either part-of-speech or DocuScope) can be highlighted in single documents. In order facilitate the highlighing of tags, the `tag_ruler` function generates a data frame with the complete document text and the spans of tagged tokens. From that data frame, the original document text can be easily recovered, and any tags of interest can be filtered for highlighting.\n",
    "\n",
    "To render the highlights, an additionally package is needed. For this demonstration, we will use (ipymarkup)[https://nbviewer.org/github/natasha/ipymarkup/blob/master/docs.ipynb], which is simple and flexible."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "id": "3ee8550d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ipymarkup import show_span_box_markup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c4970aa",
   "metadata": {},
   "source": [
    "When calling the `tag_ruler` function, a doc_id needs to be specificed. Those can be recovered easily from the tokens table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "id": "8eec2a64",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5,)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>doc_id</th></tr><tr><td>str</td></tr></thead><tbody><tr><td>&quot;acad_01.txt&quot;</td></tr><tr><td>&quot;acad_02.txt&quot;</td></tr><tr><td>&quot;acad_03.txt&quot;</td></tr><tr><td>&quot;acad_04.txt&quot;</td></tr><tr><td>&quot;acad_05.txt&quot;</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5,)\n",
       "Series: 'doc_id' [str]\n",
       "[\n",
       "\t\"acad_01.txt\"\n",
       "\t\"acad_02.txt\"\n",
       "\t\"acad_03.txt\"\n",
       "\t\"acad_04.txt\"\n",
       "\t\"acad_05.txt\"\n",
       "]"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds_tokens.get_column(\"doc_id\").unique().sort().head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "id": "67fefb63",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_pos = ds.tag_ruler(ds_tokens, doc_id='acad_17.txt', count_by='pos')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16ac7eb6",
   "metadata": {},
   "source": [
    "The data frame contains all tokens, tags and start/end of spans:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "id": "f5b91564",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (20, 4)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>tag_start</th><th>tag_end</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>u32</td></tr></thead><tbody><tr><td>&quot;In &quot;</td><td>&quot;II&quot;</td><td>0</td><td>2</td></tr><tr><td>&quot;the &quot;</td><td>&quot;AT&quot;</td><td>3</td><td>6</td></tr><tr><td>&quot;societal &quot;</td><td>&quot;JJ&quot;</td><td>7</td><td>15</td></tr><tr><td>&quot;realm &quot;</td><td>&quot;NN1&quot;</td><td>16</td><td>21</td></tr><tr><td>&quot;in &quot;</td><td>&quot;II&quot;</td><td>22</td><td>24</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;are &quot;</td><td>&quot;VBR&quot;</td><td>90</td><td>93</td></tr><tr><td>&quot;starkly &quot;</td><td>&quot;RR&quot;</td><td>94</td><td>101</td></tr><tr><td>&quot;defined&quot;</td><td>&quot;VVN&quot;</td><td>102</td><td>109</td></tr><tr><td>&quot;. &quot;</td><td>&quot;Y&quot;</td><td>109</td><td>110</td></tr><tr><td>&quot;Notions &quot;</td><td>&quot;NN2&quot;</td><td>111</td><td>118</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (20, 4)\n",
       "┌───────────┬─────┬───────────┬─────────┐\n",
       "│ Token     ┆ Tag ┆ tag_start ┆ tag_end │\n",
       "│ ---       ┆ --- ┆ ---       ┆ ---     │\n",
       "│ str       ┆ str ┆ u32       ┆ u32     │\n",
       "╞═══════════╪═════╪═══════════╪═════════╡\n",
       "│ In        ┆ II  ┆ 0         ┆ 2       │\n",
       "│ the       ┆ AT  ┆ 3         ┆ 6       │\n",
       "│ societal  ┆ JJ  ┆ 7         ┆ 15      │\n",
       "│ realm     ┆ NN1 ┆ 16        ┆ 21      │\n",
       "│ in        ┆ II  ┆ 22        ┆ 24      │\n",
       "│ …         ┆ …   ┆ …         ┆ …       │\n",
       "│ are       ┆ VBR ┆ 90        ┆ 93      │\n",
       "│ starkly   ┆ RR  ┆ 94        ┆ 101     │\n",
       "│ defined   ┆ VVN ┆ 102       ┆ 109     │\n",
       "│ .         ┆ Y   ┆ 109       ┆ 110     │\n",
       "│ Notions   ┆ NN2 ┆ 111       ┆ 118     │\n",
       "└───────────┴─────┴───────────┴─────────┘"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_pos.head(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88042032",
   "metadata": {},
   "source": [
    "The output can easily be filtered, as it here for part-of-speech tags starting with 'N' (or nouns):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "id": "a816d18e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 4)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>tag_start</th><th>tag_end</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>u32</td></tr></thead><tbody><tr><td>&quot;realm &quot;</td><td>&quot;NN1&quot;</td><td>16</td><td>21</td></tr><tr><td>&quot;Middlemarch &quot;</td><td>&quot;NP1&quot;</td><td>31</td><td>42</td></tr><tr><td>&quot;demarcation &quot;</td><td>&quot;NN1&quot;</td><td>56</td><td>67</td></tr><tr><td>&quot;women &quot;</td><td>&quot;NN2&quot;</td><td>76</td><td>81</td></tr><tr><td>&quot;men &quot;</td><td>&quot;NN2&quot;</td><td>86</td><td>89</td></tr><tr><td>&quot;Notions &quot;</td><td>&quot;NN2&quot;</td><td>111</td><td>118</td></tr><tr><td>&quot;male &quot;</td><td>&quot;NN1&quot;</td><td>122</td><td>126</td></tr><tr><td>&quot;character &quot;</td><td>&quot;NN1&quot;</td><td>138</td><td>147</td></tr><tr><td>&quot;perspective&quot;</td><td>&quot;NN1&quot;</td><td>176</td><td>187</td></tr><tr><td>&quot;reading &quot;</td><td>&quot;NN1&quot;</td><td>229</td><td>236</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 4)\n",
       "┌──────────────┬─────┬───────────┬─────────┐\n",
       "│ Token        ┆ Tag ┆ tag_start ┆ tag_end │\n",
       "│ ---          ┆ --- ┆ ---       ┆ ---     │\n",
       "│ str          ┆ str ┆ u32       ┆ u32     │\n",
       "╞══════════════╪═════╪═══════════╪═════════╡\n",
       "│ realm        ┆ NN1 ┆ 16        ┆ 21      │\n",
       "│ Middlemarch  ┆ NP1 ┆ 31        ┆ 42      │\n",
       "│ demarcation  ┆ NN1 ┆ 56        ┆ 67      │\n",
       "│ women        ┆ NN2 ┆ 76        ┆ 81      │\n",
       "│ men          ┆ NN2 ┆ 86        ┆ 89      │\n",
       "│ Notions      ┆ NN2 ┆ 111       ┆ 118     │\n",
       "│ male         ┆ NN1 ┆ 122       ┆ 126     │\n",
       "│ character    ┆ NN1 ┆ 138       ┆ 147     │\n",
       "│ perspective  ┆ NN1 ┆ 176       ┆ 187     │\n",
       "│ reading      ┆ NN1 ┆ 229       ┆ 236     │\n",
       "└──────────────┴─────┴───────────┴─────────┘"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_n = df_pos.filter(pl.col(\"Tag\").str.starts_with(\"N\"))\n",
    "df_n.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e84c03a",
   "metadata": {},
   "source": [
    "First, we will reconstruct the document text from the **full** data frame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "id": "4e89d883",
   "metadata": {},
   "outputs": [],
   "source": [
    "text = ''.join(df_pos['Token'].to_list())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0264f83e",
   "metadata": {},
   "source": [
    "Next, we will contruct a list a tuples from the **filtered** data frame, using the `tag_start`, `tag_end` and `Tag` columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "id": "dcf3d591",
   "metadata": {},
   "outputs": [],
   "source": [
    "spans = list(zip(list(df_n['tag_start']), list(df_n['tag_end']), list(df_n['Tag'])))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94701e48",
   "metadata": {},
   "source": [
    "Finally, we can use `show_span_box_markup` to highlight the tags:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "id": "28e4ac8d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"tex2jax_ignore\" style=\"white-space: pre-wrap\">In the societal <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">realm<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> in which <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Middlemarch<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> resides, the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">demarcation<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> between <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">women<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">men<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> are starkly defined. <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">Notions<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">male<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and female <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">character<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> are, especially to a modern <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">perspective<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, skewed -- and it is clear from a modern <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">reading<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">effects<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of this social <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">conditioning<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">cause<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">detriment<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> in the individual <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">characters<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> and their <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">relationships<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> to <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">others<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> in the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">novel<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. Perhaps the most <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">resonant<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of the ill-<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">effects<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of social <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">conditioning<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">character<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> who is guided by the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">principles<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of supposed <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">womanhood<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that have been, since <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">childhood<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, ingrained into her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">psyche<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. She was painstakingly taught, by means of formal <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">instruction<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, the supposed <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">qualities<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">womanhood<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, and because of this, the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">reader<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is shown, she exists as <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Eliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s hyper-socialized female <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">character<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. She wishes to be treated as a delicate being incapable of invoking <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">harm<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> -- she manipulates and obtains her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">desires<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> by emphasizing the female <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">stereotype<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> -- forgoing <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">passion<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and at <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffe0b2; background: #fff3e0\">times<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #ffb74d;\">NNT2</span></span> veritable <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">emotion<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> for the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">obtaining<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of worldly <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">prospects<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>. These <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">prospects<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> are greatly concerned with social <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">mobility<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and she is, like many <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">characters<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> in <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Eliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s novel blinded by these <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">desires<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>, a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">fact<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that brings about her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">inability<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to separate the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">reality<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">circumstance<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, from her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">conceptions<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of ideal <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">scenario<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that are, much like that from Arabian <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffe0b2; background: #fff3e0\">Nights<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #ffb74d;\">NNT2</span></span>, characterized by the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">absence<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">responsibility<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> (mental and physical, it seems), and the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">presence<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">prestige<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> Her rather grandiose <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">ideas<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">life<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> as it should be, and her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">ignoring<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">life<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> as it is, <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">results<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> in <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s strained <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">relationship<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> with <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> -- spurred by her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">devotion<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to being completely absolved from <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">fault<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, and her blind <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">attachment<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to the superficial <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">notions<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of high-<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">society<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">lineage<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">marriage<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> don&#x27;t give her the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">capacity<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to obtain. It seems <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Eliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> designed <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">conflict<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of the real and ideal, while contrasting it with that of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Dorothea<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s whose <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">conflict<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is only further <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">indication<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of her admirable <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">humanity<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, to show and emphasize the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">effects<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">women<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> operating under an imposing <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">sphere<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that purports <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">loss<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>-of-<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">self<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> as the only <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">road<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">success<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. It could be said that <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">affinity<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> was borne by the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">fact<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that his actual <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">past<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> was much of a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">mystery<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. This allowed <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> to impose her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">ideas<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of the ideal <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">mate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> onto him, and as the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">ideas<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> she imposed were essentially stunning, in a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">sense<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> she became the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">instigator<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of her own <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">courtship<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, converting <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">flirtation<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to love, when the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">reader<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> knows otherwise. The <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">narrator<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> states, &quot;<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> thought that no one could be more in <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">love<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> than she was,&quot; (<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Elliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, 295) and the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">insertion<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of &quot;<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">thought<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>&quot; into the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">equation<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> emphasizes her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">illusion<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of genuine <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">feeling<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. This is one of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">example<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">instances<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> throughout the novel <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Elliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> gives subtle <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">clues<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> to the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">fact<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">emotions<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">truths<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> are not real: she more than once &quot;<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">imagines<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">knowledge<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>,&quot; and rather than being right, the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">narrator<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> maintains she is &quot;convinced&quot; that she is. The <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">disparity<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> between <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">fixation<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> on her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">marriage<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, and the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">fact<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that he is initially apathetic to it, brings about a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">conflict<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that is telling to <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Eliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">sentiment<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> in terms of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">women<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> in a broad <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">sense<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. First, it is <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clue<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> into the genuine <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">motive<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, that being to devise a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">life<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> for herself rather than relying on <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">providence<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> was a mere <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">character<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> in the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">story<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> she wishes to create, a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">fantasy<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> in which she exists as an ephemeral <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">entity<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to be sought after, ultimately achieved and lifted to great, eminent <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">heights<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>. She is, one might say, acting as a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #d1c4e9; background: #ede7f6\">time<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #9575cd;\">NNT1</span></span> should -- with a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">sense<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">helplessness<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, and a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">faith<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that her male <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">savior<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> will present himself. What the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">reader<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> sees, however, is that <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is too operating in his <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">sphere<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">manhood<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, as he is far from invested in <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, but rather enchanted by her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">beauty<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and girlish <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">affectations<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>. He regards <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">imposing<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of the ideal onto him as a mere <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">tendency<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of the female <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">mind<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>: &quot;[<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>] held it one of the prettiest <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">attitudes<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of the feminine <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">mind<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to adore a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">man<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>&#x27;s pre-<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">eminence<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> without too precise a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">knowledge<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of what it consisted in.&quot; (<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Elliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, 234) This <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">inclination<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> suggests that his <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">ideas<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of the feminine <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">mind<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, are associated with naive <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">delusion<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">weakness<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">characteristics<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> that <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is drawn to, although more for his own <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">desire<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to assuage than for an <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">affinity<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to the afflicted. In this initial <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">interplay<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> between <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s conflicted &quot;real&quot; and &quot;ideal&quot; tangles their <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">ideas<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of one another, based on the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">roles<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> they play as male and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">female<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. On one <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">end<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">placing<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">pre<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>-<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">eminence<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> on <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> reinforces <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">notions<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">male<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>-<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">capacity<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> (not to mention her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">deeming<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of him as refined based on <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">surface<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>-level <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">qualities<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>, such as his <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">knowledge<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of the French <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">language<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>) and as <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is flattered by her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">assumption<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, he reinforces her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">role<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> as one whose mental <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">capacity<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is lacking and whose <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">mind<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is dull, but &quot;pretty&quot; still. To him, she is weak -- a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">fact<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that he relishes. The <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">reader<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> sees this <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">interplay<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> again, more intensely, during the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">scene<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">engagement<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">sorts<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>. And thus, <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">conflict<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> between the real and ideal engendered the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">outcome<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> she so desired -- but the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">foreshadowing<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of future <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">dismay<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is all too apparent. Describing the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">character<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">narrator<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">states<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>, on <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">page<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> 289, &quot;<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> was particularly forcible by means of that mild <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">persistence<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> which, as we know, enables a white soft living <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">substance<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to make it s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">way<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> in spite of opposing <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">rock<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>.&quot; <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, perhaps the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">epitome<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of female <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">delicacy<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, so strongly <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">adheres<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> to her ideal <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">world<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, that she is exasperatingly ardent her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">manipulation<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. This <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">idea<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is manifested most blatantly in her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">marriage<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that is strained by <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">desire<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to have a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">wife<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that is secondary to his <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">career<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">desire<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to have a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">husband<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that unrelentingly places her first. She defies his <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">will<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> even when he has her best <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">interest<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> in <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">mind<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> -- forgoing his <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">advice<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to refrain from <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">horseback<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> riding for the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">sake<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of posturing with <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #d7ccc8; background: #efebe9\">Captain<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #a1887f;\">NNB</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>. At the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">onset<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of their financial <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">woes<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>, <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> acts as if <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> wishes to spite her, placing the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">blame<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> on him, when in <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">actuality<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> all he had done was fail to live up to her grandiose <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">expectations<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>. She mistakes his <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">exasperation<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> with her and their <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">marriage<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> as mere moodiness, and dismisses his ill-<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">dispositions<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> to ensure that she is not affected by them. The <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">narrator<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> states, &quot;the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">thought<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> in her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">mind<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> was that if she had known <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, she would have never married him&quot; (<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Elliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, 471), and what the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">reader<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> sees, that <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> does not, is that <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> feels much of the same. <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> is unaware of this because she regards herself as the ideal, the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">embodiment<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of the perfect female <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">specimen<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> who &quot;no <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> could behave more irreproachably&quot; than (472), completely free from <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">culpability<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">victim<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">husband<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> who &quot;had a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">way<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of taking <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">things<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> which made them a great <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">deal<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> worse for her.&quot; The <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">reality<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of it, however, is that she is childish and artificial, a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of &quot;polite <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">impassibility<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>&quot; (609), perhaps the only <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">character<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> who remains throughout <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Middlemarch<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, as morally stupid and one-dimensional as she began. Through the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">fashioning<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">character<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, it seems <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Elliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> adhered to a strict <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">notion<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">femininity<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> -- one that was perhaps the pervasive <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">notion<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> at the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #d1c4e9; background: #ede7f6\">time<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #9575cd;\">NNT1</span></span>. The <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">strain<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> in <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">marriage<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> reaches a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">head<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, at the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">point<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> when <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">Lydgate<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is &quot;prone to <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">outbursts<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">indignation<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>,&quot; and his <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">enchantment<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> with his coy <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">mistress<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> has changed to subtle <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">resentment<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. He realizes, he didn&#x27;t marry a virtuous <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, but rather his own idealized <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">view<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of what this <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> was based on socially accepted (<span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">surface<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">level<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>) <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">ideas<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>. Moreover, he realizes that although he has &quot;spent <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #d1c4e9; background: #ede7f6\">month<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #9575cd;\">NNT1</span></span> after <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #d1c4e9; background: #ede7f6\">month<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #9575cd;\">NNT1</span></span> sacrafising without <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">impatience<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>&quot; (464) <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">thirst<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> for <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">wealth<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">eminence<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and all the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">things<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> she thinks will give <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">merit<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">womanhood<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is impossible to quench. &quot;It is the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">way<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> with all <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>,&quot; he says. However, &quot;[his] <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">power<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of generalizing all <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">women<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>...was thwarted by [his] <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">memory<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of wondering <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">impressions<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> from the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">behavior<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of another <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>.&quot; (468) That <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, of course, being <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Dorothea<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>. There are two salient interplays between <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Dorothea<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> in relation to the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">conflict<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> between the real and ideal. The first being the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">nature<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of the two <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">characters<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>&#x27; own <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">conflicts<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>. <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">conflict<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is purely of worldly <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">affairs<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> -- she wishes to become something that represents something else. She negates her inner <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">vitality<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and becomes a mechanical <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">being<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, whose <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">desires<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> are to be adorned and to be scorned through jealously. <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Dorothea<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">conflict<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, conversely is her unrelenting <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">attachment<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> to the good of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">others<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span>. One of the final <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">scene<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Middlemarch<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, in which she meets <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, she assumes, wrongly, that Rosamoned&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">actions<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> are pure. <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Dorothea<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">conflict<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> is spurred by the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">fact<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> that she herself is a pure human being -- <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s is spurred by her diluted <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">consciousness<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. The second <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">interplay<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> moves away from the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">novel<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> and into it s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">context<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>. Could <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Elliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> have, in her two main female <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">character<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> presented her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">ideas<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> of the real and ideal? It is perhaps a cynical <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">view<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> from the <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">author<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> (whose <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">attitudes<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> towards <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">woman<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> were rather cynical) because it seems <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Dorothea<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> represents the ideal, while <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> in all of her outward <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">grace<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> but inner <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">spoil<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>, represents the real. And as <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Dorothea<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>&#x27;s <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">aspirations<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> are never realized, the real <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">story<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span> of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">women<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Elliot<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span> may be suggesting, is that of <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">Rosamond<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">NP1</span></span>, who stayed &quot;in her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">place<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">NN1</span></span>&quot; and had her <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">dreams<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">NN2</span></span> (of marrying rich) ultimately fulfilled. </div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "show_span_box_markup(text, spans)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02535637",
   "metadata": {},
   "source": [
    "The same thing can be done for DocuScope tags by switching `count_by` to 'ds':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "id": "c40bf491",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (20, 4)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>tag_start</th><th>tag_end</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>u32</td></tr></thead><tbody><tr><td>&quot;Often &quot;</td><td>&quot;Narrative&quot;</td><td>0</td><td>5</td></tr><tr><td>&quot;referred &quot;</td><td>&quot;InformationReportVerbs&quot;</td><td>6</td><td>14</td></tr><tr><td>&quot;to &quot;</td><td>&quot;InformationReportVerbs&quot;</td><td>15</td><td>17</td></tr><tr><td>&quot;as &quot;</td><td>&quot;InformationReportVerbs&quot;</td><td>18</td><td>20</td></tr><tr><td>&quot;the &quot;</td><td>&quot;Untagged&quot;</td><td>21</td><td>24</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;argument &quot;</td><td>&quot;AcademicTerms&quot;</td><td>83</td><td>91</td></tr><tr><td>&quot;about &quot;</td><td>&quot;Untagged&quot;</td><td>92</td><td>97</td></tr><tr><td>&quot;the &quot;</td><td>&quot;Untagged&quot;</td><td>98</td><td>101</td></tr><tr><td>&quot;existence &quot;</td><td>&quot;Untagged&quot;</td><td>102</td><td>111</td></tr><tr><td>&quot;of &quot;</td><td>&quot;PublicTerms&quot;</td><td>112</td><td>114</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (20, 4)\n",
       "┌────────────┬────────────────────────┬───────────┬─────────┐\n",
       "│ Token      ┆ Tag                    ┆ tag_start ┆ tag_end │\n",
       "│ ---        ┆ ---                    ┆ ---       ┆ ---     │\n",
       "│ str        ┆ str                    ┆ u32       ┆ u32     │\n",
       "╞════════════╪════════════════════════╪═══════════╪═════════╡\n",
       "│ Often      ┆ Narrative              ┆ 0         ┆ 5       │\n",
       "│ referred   ┆ InformationReportVerbs ┆ 6         ┆ 14      │\n",
       "│ to         ┆ InformationReportVerbs ┆ 15        ┆ 17      │\n",
       "│ as         ┆ InformationReportVerbs ┆ 18        ┆ 20      │\n",
       "│ the        ┆ Untagged               ┆ 21        ┆ 24      │\n",
       "│ …          ┆ …                      ┆ …         ┆ …       │\n",
       "│ argument   ┆ AcademicTerms          ┆ 83        ┆ 91      │\n",
       "│ about      ┆ Untagged               ┆ 92        ┆ 97      │\n",
       "│ the        ┆ Untagged               ┆ 98        ┆ 101     │\n",
       "│ existence  ┆ Untagged               ┆ 102       ┆ 111     │\n",
       "│ of         ┆ PublicTerms            ┆ 112       ┆ 114     │\n",
       "└────────────┴────────────────────────┴───────────┴─────────┘"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_ds = ds.tag_ruler(ds_tokens, doc_id='acad_37.txt', count_by='ds')\n",
    "df_ds.head(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f700e87",
   "metadata": {},
   "source": [
    "This time, we'll filter for tags related to expressions of confidence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "id": "b0af035f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (10, 4)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>Token</th><th>Tag</th><th>tag_start</th><th>tag_end</th></tr><tr><td>str</td><td>str</td><td>u32</td><td>u32</td></tr></thead><tbody><tr><td>&quot;very &quot;</td><td>&quot;ConfidenceHigh&quot;</td><td>66</td><td>70</td></tr><tr><td>&quot;clearly &quot;</td><td>&quot;ConfidenceHigh&quot;</td><td>371</td><td>378</td></tr><tr><td>&quot;distinctly &quot;</td><td>&quot;ConfidenceHigh&quot;</td><td>383</td><td>393</td></tr><tr><td>&quot;clearly &quot;</td><td>&quot;ConfidenceHigh&quot;</td><td>563</td><td>570</td></tr><tr><td>&quot;distinctly &quot;</td><td>&quot;ConfidenceHigh&quot;</td><td>575</td><td>585</td></tr><tr><td>&quot;is &quot;</td><td>&quot;ConfidenceHigh&quot;</td><td>596</td><td>598</td></tr><tr><td>&quot;true&quot;</td><td>&quot;ConfidenceHigh&quot;</td><td>599</td><td>603</td></tr><tr><td>&quot;are &quot;</td><td>&quot;ConfidenceHigh&quot;</td><td>729</td><td>732</td></tr><tr><td>&quot;true&quot;</td><td>&quot;ConfidenceHigh&quot;</td><td>733</td><td>737</td></tr><tr><td>&quot;clearly &quot;</td><td>&quot;ConfidenceHigh&quot;</td><td>789</td><td>796</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (10, 4)\n",
       "┌─────────────┬────────────────┬───────────┬─────────┐\n",
       "│ Token       ┆ Tag            ┆ tag_start ┆ tag_end │\n",
       "│ ---         ┆ ---            ┆ ---       ┆ ---     │\n",
       "│ str         ┆ str            ┆ u32       ┆ u32     │\n",
       "╞═════════════╪════════════════╪═══════════╪═════════╡\n",
       "│ very        ┆ ConfidenceHigh ┆ 66        ┆ 70      │\n",
       "│ clearly     ┆ ConfidenceHigh ┆ 371       ┆ 378     │\n",
       "│ distinctly  ┆ ConfidenceHigh ┆ 383       ┆ 393     │\n",
       "│ clearly     ┆ ConfidenceHigh ┆ 563       ┆ 570     │\n",
       "│ distinctly  ┆ ConfidenceHigh ┆ 575       ┆ 585     │\n",
       "│ is          ┆ ConfidenceHigh ┆ 596       ┆ 598     │\n",
       "│ true        ┆ ConfidenceHigh ┆ 599       ┆ 603     │\n",
       "│ are         ┆ ConfidenceHigh ┆ 729       ┆ 732     │\n",
       "│ true        ┆ ConfidenceHigh ┆ 733       ┆ 737     │\n",
       "│ clearly     ┆ ConfidenceHigh ┆ 789       ┆ 796     │\n",
       "└─────────────┴────────────────┴───────────┴─────────┘"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_c = df_ds.filter(pl.col(\"Tag\").str.starts_with(\"Conf\"))\n",
    "df_c.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe71de8d",
   "metadata": {},
   "source": [
    "Again, the text is reconstructed from the full data frame, and the spans are taken from the filtered one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "id": "1fb90a59",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"tex2jax_ignore\" style=\"white-space: pre-wrap\">Often referred to as the &quot;Cartesian Circle&quot;, Descartes presents a <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">very<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> problematic argument about the existence of God. He presupposes the truth of the premise of clear and distinct perception in order to prove the existence of God. Then once he proves the existence of God, he uses it to prove the validity of the clear and distinct perception premise; that whatever we <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clearly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">distinctly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> perceive must be true. In the excerpt on page 105 of Descartes&#x27; Meditations, he provides the missing explanation of the logic behind the idea that anything that someone <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clearly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">distinctly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> perceives <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">is<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">true<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span>. The first premise that Descartes provides is that there exist some things that we can never think of without believing they <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">are<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">true<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span>. Descartes refers to these things as those that we <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clearly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">distinctly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> perceive. When we do try to imagine that these things are false, it <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">simply<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> does not make sense. Descartes gives two examples of this: 1) I exist so long as I am thinking and 2) what is done cannot be undone. <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">We<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">can<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> try to imagine these premises being false, however when we get into details about how <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">they<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">could<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">be<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> false we quickly lose our way. As a result, Descartes concludes that every time we recall these ideas into our minds, we believe that they <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">are<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">true<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span>. The next premise that Descartes provides is that <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">we<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">can<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span>not doubt an idea without simultaneously thinking of it. He does not go into much detail about this argument, because it is very much an obvious point to make. In order to decide that we do not agree with something, we must first recall it into our mind; <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">we<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">can<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span>not simply disagree with something without first thinking of the idea. Although this idea is <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">seemingly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">very<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">obvious<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span>, <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">it<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">is<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> nonetheless an important premise for his later conclusion. Descartes then draws from these two premises the conclusion that any time we doubt something that we <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clearly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">distinctly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> perceive, we at the same time believe that <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">it<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">is<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">true<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span>. According to the second premise, in order to doubt an idea, we first bring that idea into our heads. However, according to the first premise, we are instantaneously convinced of the truth of the premise when we bring the idea into our head because we <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clearly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">distinctly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> perceive it. So when we doubt any of these ideas, we also believe the ideas at the same time. A third premise that Descartes uses is that <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">it<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">is<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> impossible to both doubt something and believe it to be true at the same time. These are mutually exclusive states of mind; <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">it<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">is<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">a<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> logical impossibility to both doubt and believe something to be true simultaneously. Overall this premise is very <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">obvious<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span>, but <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">it<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">is<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> required for Descartes&#x27; argument to be complete. Using this third premise and the first conclusion, Descartes draws his final conclusion: <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">we<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">can<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">never<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> doubt what we <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clearly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">distinctly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> perceive. The three premises together lead us to a logical impossibility, one element of the premises must be logically <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #ffcdd2; background: #ffebee\">impossible<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #e57373;\">ConfidenceLow</span></span>. To further his argument, he decided that the impossible element is the act of doubting the things which we <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clearly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">distinctly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> perceive. Doubting these ideas leads us to an impossible state of both belief and doubt, so it we <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">simply<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> cannot doubt them. The reason why this excerpt fits in with the main purpose of the Meditations is that it finally gives a clear definition of clear and distinct perception. Throughout the Meditations, Descartes builds up the argument that if we can <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clearly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> and distinct perceive something, <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">we<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #c8e6c9; background: #e8f5e9\">can<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #66bb6a;\">ConfidenceHedged</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">know<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">that<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> it is true. However, he does not go into many details about what it means to <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clearly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">distinctly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> perceive something. But he finally defines it as that which is &quot;so transparently clear and at the same time so simple that we cannot ever think of them without believing them to be true&quot; (1). This is a very clear definition that would have been useful earlier on in the Meditations. In addition, Descartes&#x27; response to the objector gives us another <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">proof<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">of<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> the clear and distinct perception argument. As we have already established in class, the argument is flawed on many different levels. But Descartes still remains <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">absolutely<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">convinced<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> of the validity of the clear and distinct perception argument, so he attempts to advance another separate explanation for it. In it, Descartes provides us with a clear and thought-out argument about why it is impossible to doubt that which we <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">clearly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> and <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">distinctly<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> perceive. Although Descartes argument about clear and distinct perception has it s problems, this excerpt helps the reader understand the concept more. As we discussed in class, Descartes never completely explains why he is not creating what has been referred to as the &quot;Cartesian Circle&quot;. But this did not stop him from advocating it as a way for us to <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">definitively<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">know<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">that<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> God exists. Descartes was <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">very<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> <span style=\"padding: 2px; border-radius: 4px; border: 1px solid #bbdefb; background: #e3f2fd\">sure<span style=\"vertical-align: middle; margin-left: 2px; font-size: 0.7em; color: #64b5f6;\">ConfidenceHigh</span></span> that the argument of clear and distinct perception was powerful and this excerpt lets us inside of his head on the idea. As much as his argument for clear and distinct perception has aligned, one cannot argue that he did not put any thought into it. </div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "text = ''.join(df_ds['Token'].to_list())\n",
    "spans = list(zip(list(df_c['tag_start']), list(df_c['tag_end']), list(df_c['Tag'])))\n",
    "show_span_box_markup(text, spans)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8332b24f",
   "metadata": {},
   "source": [
    "## Compatability with tmtoolkit\n",
    "\n",
    "The **docuscospacy** package not longer requires **tmtoolkit** as a dependency. However, there some functions are included that allow users to move data between the two.\n",
    "\n",
    "All necessary pre-processing  is now done inside the `docuscope_parse` function. If you choose to use tmtoolkit, you will need to explicitly define your own pre-processing function. **For accurate tagging**, possessive *its* should be split into two tokens. The last part of the function will eliminate carriage returns, tabs, extra spaces, etc.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note: Adding pre-processing functions**\n",
    "\n",
    "You can also pass other functions as part of the `raw_preproc` argument in a list. For example: `raw_preproc=[pre_process, simplify_unicode_chars]` would add a function built in to **tmtoolkit** that replaces accented with non accented characters.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "id": "d687cf40",
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "from tmtoolkit.corpus import Corpus\n",
    "\n",
    "def pre_process(txt):\n",
    "    txt = re.sub(r'\\bits\\b', 'it s', txt)\n",
    "    txt = re.sub(r'\\bIts\\b', 'It s', txt)\n",
    "    txt = \" \".join(txt.split())\n",
    "    return(txt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "id": "635af7ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "corp = Corpus.from_folder('data/tar_corpus', spacy_instance=nlp, raw_preproc=[pre_process], spacy_token_attrs=['tag', 'ent_iob', 'ent_type', 'is_punct'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d95b1a1d",
   "metadata": {},
   "source": [
    "### Converting a corpus\n",
    "\n",
    "To convert a tmtoolkit Corpus object, use the `from_tmtoolkit` function.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note: `convert_corpus` function**\n",
    "\n",
    "Note that the `convert_corpus` function has been depreicated. Use the `from_tmtoolkit` function instead.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "id": "6d39f03c",
   "metadata": {},
   "outputs": [],
   "source": [
    "tm_corpus = ds.from_tmtoolkit(corp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c3f37de",
   "metadata": {},
   "source": [
    "The result is a dictionary, whose keys are the names of the corpus files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "id": "cac6a4a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><style>\n",
       ".dataframe > thead > tr,\n",
       ".dataframe > tbody > tr {\n",
       "  text-align: right;\n",
       "  white-space: pre-wrap;\n",
       "}\n",
       "</style>\n",
       "<small>shape: (5, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>doc_id</th><th>token</th><th>pos_tag</th><th>ds_tag</th><th>pos_id</th><th>ds_id</th></tr><tr><td>str</td><td>str</td><td>str</td><td>str</td><td>u32</td><td>u32</td></tr></thead><tbody><tr><td>&quot;acad_01&quot;</td><td>&quot;In &quot;</td><td>&quot;II&quot;</td><td>&quot;Untagged&quot;</td><td>1</td><td>1</td></tr><tr><td>&quot;acad_01&quot;</td><td>&quot;the &quot;</td><td>&quot;AT&quot;</td><td>&quot;Untagged&quot;</td><td>2</td><td>2</td></tr><tr><td>&quot;acad_01&quot;</td><td>&quot;field &quot;</td><td>&quot;NN1&quot;</td><td>&quot;Untagged&quot;</td><td>3</td><td>3</td></tr><tr><td>&quot;acad_01&quot;</td><td>&quot;of &quot;</td><td>&quot;IO&quot;</td><td>&quot;Untagged&quot;</td><td>4</td><td>4</td></tr><tr><td>&quot;acad_01&quot;</td><td>&quot;plant &quot;</td><td>&quot;NN1&quot;</td><td>&quot;InformationTopics&quot;</td><td>5</td><td>5</td></tr></tbody></table></div>"
      ],
      "text/plain": [
       "shape: (5, 6)\n",
       "┌─────────┬────────┬─────────┬───────────────────┬────────┬───────┐\n",
       "│ doc_id  ┆ token  ┆ pos_tag ┆ ds_tag            ┆ pos_id ┆ ds_id │\n",
       "│ ---     ┆ ---    ┆ ---     ┆ ---               ┆ ---    ┆ ---   │\n",
       "│ str     ┆ str    ┆ str     ┆ str               ┆ u32    ┆ u32   │\n",
       "╞═════════╪════════╪═════════╪═══════════════════╪════════╪═══════╡\n",
       "│ acad_01 ┆ In     ┆ II      ┆ Untagged          ┆ 1      ┆ 1     │\n",
       "│ acad_01 ┆ the    ┆ AT      ┆ Untagged          ┆ 2      ┆ 2     │\n",
       "│ acad_01 ┆ field  ┆ NN1     ┆ Untagged          ┆ 3      ┆ 3     │\n",
       "│ acad_01 ┆ of     ┆ IO      ┆ Untagged          ┆ 4      ┆ 4     │\n",
       "│ acad_01 ┆ plant  ┆ NN1     ┆ InformationTopics ┆ 5      ┆ 5     │\n",
       "└─────────┴────────┴─────────┴───────────────────┴────────┴───────┘"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tm_corpus.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9385723",
   "metadata": {},
   "source": [
    "A **dtm** can also be passed to **tmtoolkit** functions to create normalized counts (using the `tf_proportions` function), [tf-idf values](https://tmtoolkit.readthedocs.io/en/latest/bow.html#Term-frequency%E2%80%93inverse-document-frequency-transformation-(tf-idf)) (using the `tfidf` function), or other kids of data structures."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "id": "f9514c93",
   "metadata": {},
   "outputs": [],
   "source": [
    "from tmtoolkit.bow.bow_stats import tf_proportions, tfidf\n",
    "from tmtoolkit.bow.dtm import dtm_to_dataframe"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9a9b75e",
   "metadata": {},
   "source": [
    "Beginning with version 0.12.0 of **tmtoolkit**, matrices must first be converted into a COOrdinate format. This can be done using the `dtm_to_coo` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "id": "a0d22422",
   "metadata": {},
   "outputs": [],
   "source": [
    "tags_coo, docs, vocab = ds.dtm_to_coo(tm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "id": "3d885d31",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<COOrdinate sparse matrix of dtype 'uint32'\n",
       "\twith 1657 stored elements and shape (50, 37)>"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tags_coo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "067857bf",
   "metadata": {},
   "source": [
    "These can now be processed using various **tmtoolkit** functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "id": "899d1906",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Untagged</th>\n",
       "      <th>AcademicTerms</th>\n",
       "      <th>Character</th>\n",
       "      <th>Narrative</th>\n",
       "      <th>Description</th>\n",
       "      <th>InformationExposition</th>\n",
       "      <th>InformationTopics</th>\n",
       "      <th>Negative</th>\n",
       "      <th>Positive</th>\n",
       "      <th>MetadiscourseCohesive</th>\n",
       "      <th>Reasoning</th>\n",
       "      <th>ForceStressed</th>\n",
       "      <th>PublicTerms</th>\n",
       "      <th>Strategic</th>\n",
       "      <th>InformationStates</th>\n",
       "      <th>InformationChange</th>\n",
       "      <th>ConfidenceHedged</th>\n",
       "      <th>InformationReportVerbs</th>\n",
       "      <th>Citation</th>\n",
       "      <th>InformationPlace</th>\n",
       "      <th>Interactive</th>\n",
       "      <th>Inquiry</th>\n",
       "      <th>Future</th>\n",
       "      <th>ConfidenceHigh</th>\n",
       "      <th>Contingent</th>\n",
       "      <th>AcademicWritingMoves</th>\n",
       "      <th>Facilitate</th>\n",
       "      <th>MetadiscourseInteractive</th>\n",
       "      <th>Updates</th>\n",
       "      <th>InformationChangePositive</th>\n",
       "      <th>CitationAuthority</th>\n",
       "      <th>FirstPerson</th>\n",
       "      <th>Responsibility</th>\n",
       "      <th>InformationChangeNegative</th>\n",
       "      <th>Uncertainty</th>\n",
       "      <th>ConfidenceLow</th>\n",
       "      <th>CitationHedged</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>acad_01.txt</th>\n",
       "      <td>324</td>\n",
       "      <td>127</td>\n",
       "      <td>15</td>\n",
       "      <td>66</td>\n",
       "      <td>70</td>\n",
       "      <td>57</td>\n",
       "      <td>15</td>\n",
       "      <td>10</td>\n",
       "      <td>9</td>\n",
       "      <td>12</td>\n",
       "      <td>26</td>\n",
       "      <td>7</td>\n",
       "      <td>4</td>\n",
       "      <td>10</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>15</td>\n",
       "      <td>17</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>18</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>16</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acad_02.txt</th>\n",
       "      <td>760</td>\n",
       "      <td>255</td>\n",
       "      <td>79</td>\n",
       "      <td>133</td>\n",
       "      <td>132</td>\n",
       "      <td>157</td>\n",
       "      <td>74</td>\n",
       "      <td>67</td>\n",
       "      <td>66</td>\n",
       "      <td>97</td>\n",
       "      <td>51</td>\n",
       "      <td>54</td>\n",
       "      <td>18</td>\n",
       "      <td>24</td>\n",
       "      <td>33</td>\n",
       "      <td>40</td>\n",
       "      <td>60</td>\n",
       "      <td>38</td>\n",
       "      <td>12</td>\n",
       "      <td>9</td>\n",
       "      <td>22</td>\n",
       "      <td>8</td>\n",
       "      <td>20</td>\n",
       "      <td>20</td>\n",
       "      <td>38</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>8</td>\n",
       "      <td>26</td>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acad_03.txt</th>\n",
       "      <td>2392</td>\n",
       "      <td>844</td>\n",
       "      <td>465</td>\n",
       "      <td>422</td>\n",
       "      <td>435</td>\n",
       "      <td>428</td>\n",
       "      <td>240</td>\n",
       "      <td>201</td>\n",
       "      <td>160</td>\n",
       "      <td>142</td>\n",
       "      <td>160</td>\n",
       "      <td>126</td>\n",
       "      <td>52</td>\n",
       "      <td>78</td>\n",
       "      <td>124</td>\n",
       "      <td>130</td>\n",
       "      <td>137</td>\n",
       "      <td>57</td>\n",
       "      <td>415</td>\n",
       "      <td>49</td>\n",
       "      <td>39</td>\n",
       "      <td>82</td>\n",
       "      <td>42</td>\n",
       "      <td>30</td>\n",
       "      <td>43</td>\n",
       "      <td>20</td>\n",
       "      <td>28</td>\n",
       "      <td>31</td>\n",
       "      <td>21</td>\n",
       "      <td>47</td>\n",
       "      <td>23</td>\n",
       "      <td>42</td>\n",
       "      <td>3</td>\n",
       "      <td>32</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acad_04.txt</th>\n",
       "      <td>373</td>\n",
       "      <td>72</td>\n",
       "      <td>28</td>\n",
       "      <td>64</td>\n",
       "      <td>161</td>\n",
       "      <td>73</td>\n",
       "      <td>29</td>\n",
       "      <td>31</td>\n",
       "      <td>42</td>\n",
       "      <td>39</td>\n",
       "      <td>35</td>\n",
       "      <td>17</td>\n",
       "      <td>22</td>\n",
       "      <td>35</td>\n",
       "      <td>12</td>\n",
       "      <td>12</td>\n",
       "      <td>19</td>\n",
       "      <td>23</td>\n",
       "      <td>3</td>\n",
       "      <td>9</td>\n",
       "      <td>7</td>\n",
       "      <td>6</td>\n",
       "      <td>11</td>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>24</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acad_05.txt</th>\n",
       "      <td>651</td>\n",
       "      <td>200</td>\n",
       "      <td>47</td>\n",
       "      <td>133</td>\n",
       "      <td>172</td>\n",
       "      <td>79</td>\n",
       "      <td>77</td>\n",
       "      <td>73</td>\n",
       "      <td>18</td>\n",
       "      <td>42</td>\n",
       "      <td>52</td>\n",
       "      <td>33</td>\n",
       "      <td>2</td>\n",
       "      <td>14</td>\n",
       "      <td>33</td>\n",
       "      <td>65</td>\n",
       "      <td>21</td>\n",
       "      <td>27</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "      <td>21</td>\n",
       "      <td>5</td>\n",
       "      <td>19</td>\n",
       "      <td>17</td>\n",
       "      <td>7</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Untagged  AcademicTerms  ...  ConfidenceLow  CitationHedged\n",
       "acad_01.txt       324            127  ...              0               0\n",
       "acad_02.txt       760            255  ...              1               1\n",
       "acad_03.txt      2392            844  ...              1               3\n",
       "acad_04.txt       373             72  ...              0               0\n",
       "acad_05.txt       651            200  ...              1               0\n",
       "\n",
       "[5 rows x 37 columns]"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dtm_to_dataframe(tags_coo, docs, vocab).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "id": "629b87b1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Untagged</th>\n",
       "      <th>AcademicTerms</th>\n",
       "      <th>Character</th>\n",
       "      <th>Narrative</th>\n",
       "      <th>Description</th>\n",
       "      <th>InformationExposition</th>\n",
       "      <th>InformationTopics</th>\n",
       "      <th>Negative</th>\n",
       "      <th>Positive</th>\n",
       "      <th>MetadiscourseCohesive</th>\n",
       "      <th>Reasoning</th>\n",
       "      <th>ForceStressed</th>\n",
       "      <th>PublicTerms</th>\n",
       "      <th>Strategic</th>\n",
       "      <th>InformationStates</th>\n",
       "      <th>InformationChange</th>\n",
       "      <th>ConfidenceHedged</th>\n",
       "      <th>InformationReportVerbs</th>\n",
       "      <th>Citation</th>\n",
       "      <th>InformationPlace</th>\n",
       "      <th>Interactive</th>\n",
       "      <th>Inquiry</th>\n",
       "      <th>Future</th>\n",
       "      <th>ConfidenceHigh</th>\n",
       "      <th>Contingent</th>\n",
       "      <th>AcademicWritingMoves</th>\n",
       "      <th>Facilitate</th>\n",
       "      <th>MetadiscourseInteractive</th>\n",
       "      <th>Updates</th>\n",
       "      <th>InformationChangePositive</th>\n",
       "      <th>CitationAuthority</th>\n",
       "      <th>FirstPerson</th>\n",
       "      <th>Responsibility</th>\n",
       "      <th>InformationChangeNegative</th>\n",
       "      <th>Uncertainty</th>\n",
       "      <th>ConfidenceLow</th>\n",
       "      <th>CitationHedged</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>acad_01.txt</th>\n",
       "      <td>0.258933</td>\n",
       "      <td>0.101495</td>\n",
       "      <td>0.011988</td>\n",
       "      <td>0.052746</td>\n",
       "      <td>0.055942</td>\n",
       "      <td>0.045553</td>\n",
       "      <td>0.012160</td>\n",
       "      <td>0.007992</td>\n",
       "      <td>0.007193</td>\n",
       "      <td>0.009590</td>\n",
       "      <td>0.020779</td>\n",
       "      <td>0.005594</td>\n",
       "      <td>0.003197</td>\n",
       "      <td>0.007992</td>\n",
       "      <td>0.007403</td>\n",
       "      <td>0.007992</td>\n",
       "      <td>0.011988</td>\n",
       "      <td>0.013586</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.002432</td>\n",
       "      <td>0.014593</td>\n",
       "      <td>0.002504</td>\n",
       "      <td>0.002398</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.013357</td>\n",
       "      <td>0.000811</td>\n",
       "      <td>0.002398</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000874</td>\n",
       "      <td>0.001834</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.001964</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acad_02.txt</th>\n",
       "      <td>0.222591</td>\n",
       "      <td>0.074685</td>\n",
       "      <td>0.023138</td>\n",
       "      <td>0.038953</td>\n",
       "      <td>0.038660</td>\n",
       "      <td>0.045983</td>\n",
       "      <td>0.021986</td>\n",
       "      <td>0.019623</td>\n",
       "      <td>0.019330</td>\n",
       "      <td>0.028410</td>\n",
       "      <td>0.014937</td>\n",
       "      <td>0.015816</td>\n",
       "      <td>0.005272</td>\n",
       "      <td>0.007029</td>\n",
       "      <td>0.009948</td>\n",
       "      <td>0.011715</td>\n",
       "      <td>0.017573</td>\n",
       "      <td>0.011130</td>\n",
       "      <td>0.003843</td>\n",
       "      <td>0.002928</td>\n",
       "      <td>0.006536</td>\n",
       "      <td>0.002377</td>\n",
       "      <td>0.006119</td>\n",
       "      <td>0.005858</td>\n",
       "      <td>0.011455</td>\n",
       "      <td>0.001530</td>\n",
       "      <td>0.002080</td>\n",
       "      <td>0.000879</td>\n",
       "      <td>0.002412</td>\n",
       "      <td>0.008327</td>\n",
       "      <td>0.001008</td>\n",
       "      <td>0.003558</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000920</td>\n",
       "      <td>0.000395</td>\n",
       "      <td>0.000607</td>\n",
       "      <td>0.000734</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acad_03.txt</th>\n",
       "      <td>0.216396</td>\n",
       "      <td>0.076354</td>\n",
       "      <td>0.042067</td>\n",
       "      <td>0.038177</td>\n",
       "      <td>0.039353</td>\n",
       "      <td>0.038720</td>\n",
       "      <td>0.022025</td>\n",
       "      <td>0.018184</td>\n",
       "      <td>0.014475</td>\n",
       "      <td>0.012846</td>\n",
       "      <td>0.014475</td>\n",
       "      <td>0.011399</td>\n",
       "      <td>0.004704</td>\n",
       "      <td>0.007056</td>\n",
       "      <td>0.011546</td>\n",
       "      <td>0.011761</td>\n",
       "      <td>0.012394</td>\n",
       "      <td>0.005157</td>\n",
       "      <td>0.041056</td>\n",
       "      <td>0.004925</td>\n",
       "      <td>0.003579</td>\n",
       "      <td>0.007525</td>\n",
       "      <td>0.003969</td>\n",
       "      <td>0.002714</td>\n",
       "      <td>0.004004</td>\n",
       "      <td>0.001890</td>\n",
       "      <td>0.002570</td>\n",
       "      <td>0.002804</td>\n",
       "      <td>0.001955</td>\n",
       "      <td>0.004650</td>\n",
       "      <td>0.002388</td>\n",
       "      <td>0.005129</td>\n",
       "      <td>0.000334</td>\n",
       "      <td>0.004544</td>\n",
       "      <td>0.001099</td>\n",
       "      <td>0.000188</td>\n",
       "      <td>0.000680</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acad_04.txt</th>\n",
       "      <td>0.216174</td>\n",
       "      <td>0.041728</td>\n",
       "      <td>0.016228</td>\n",
       "      <td>0.037091</td>\n",
       "      <td>0.093308</td>\n",
       "      <td>0.042307</td>\n",
       "      <td>0.017049</td>\n",
       "      <td>0.017966</td>\n",
       "      <td>0.024341</td>\n",
       "      <td>0.022603</td>\n",
       "      <td>0.020284</td>\n",
       "      <td>0.009852</td>\n",
       "      <td>0.012750</td>\n",
       "      <td>0.020284</td>\n",
       "      <td>0.007158</td>\n",
       "      <td>0.006955</td>\n",
       "      <td>0.011012</td>\n",
       "      <td>0.013330</td>\n",
       "      <td>0.001901</td>\n",
       "      <td>0.005795</td>\n",
       "      <td>0.004115</td>\n",
       "      <td>0.003527</td>\n",
       "      <td>0.006659</td>\n",
       "      <td>0.002318</td>\n",
       "      <td>0.003579</td>\n",
       "      <td>0.014530</td>\n",
       "      <td>0.007055</td>\n",
       "      <td>0.000580</td>\n",
       "      <td>0.000597</td>\n",
       "      <td>0.001268</td>\n",
       "      <td>0.001330</td>\n",
       "      <td>0.000782</td>\n",
       "      <td>0.001425</td>\n",
       "      <td>0.000910</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acad_05.txt</th>\n",
       "      <td>0.241753</td>\n",
       "      <td>0.074271</td>\n",
       "      <td>0.017454</td>\n",
       "      <td>0.049390</td>\n",
       "      <td>0.063873</td>\n",
       "      <td>0.029337</td>\n",
       "      <td>0.029007</td>\n",
       "      <td>0.027109</td>\n",
       "      <td>0.006684</td>\n",
       "      <td>0.015597</td>\n",
       "      <td>0.019311</td>\n",
       "      <td>0.012255</td>\n",
       "      <td>0.000743</td>\n",
       "      <td>0.005199</td>\n",
       "      <td>0.012614</td>\n",
       "      <td>0.024138</td>\n",
       "      <td>0.007798</td>\n",
       "      <td>0.010027</td>\n",
       "      <td>0.001218</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.002637</td>\n",
       "      <td>0.003767</td>\n",
       "      <td>0.008146</td>\n",
       "      <td>0.001857</td>\n",
       "      <td>0.007262</td>\n",
       "      <td>0.006595</td>\n",
       "      <td>0.002637</td>\n",
       "      <td>0.001857</td>\n",
       "      <td>0.001147</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000501</td>\n",
       "      <td>0.000913</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000770</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Untagged  AcademicTerms  ...  ConfidenceLow  CitationHedged\n",
       "acad_01.txt  0.258933       0.101495  ...       0.000000        0.000000\n",
       "acad_02.txt  0.222591       0.074685  ...       0.000607        0.000734\n",
       "acad_03.txt  0.216396       0.076354  ...       0.000188        0.000680\n",
       "acad_04.txt  0.216174       0.041728  ...       0.000000        0.000000\n",
       "acad_05.txt  0.241753       0.074271  ...       0.000770        0.000000\n",
       "\n",
       "[5 rows x 37 columns]"
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tfidf_coo = tfidf(tags_coo)\n",
    "dtm_to_dataframe(tfidf_coo, docs, vocab).head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ds_test",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}