<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Piet Stam - My blog posts</title>
<link>https://www.pietstam.nl/posts.html</link>
<atom:link href="https://www.pietstam.nl/posts.xml" rel="self" type="application/rss+xml"/>
<description></description>
<image>
<url>https://www.pietstam.nl/profile.png</url>
<title>Piet Stam - My blog posts</title>
<link>https://www.pietstam.nl/posts.html</link>
<height>144</height>
<width>144</width>
</image>
<generator>quarto-1.8.27</generator>
<lastBuildDate>Sat, 14 Mar 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>A privacy-first workflow for processing documents, videos, podcasts, and RSS feeds with local AI</title>
  <dc:creator>Piet Stam</dc:creator>
  <dc:creator>Claude Sonnet 4.6</dc:creator>
  <link>https://www.pietstam.nl/posts/2026-03-14-building-a-privacy-first-research-workflow/</link>
  <description><![CDATA[ 






<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>In 2023, I wrote about my <a href="https://www.pietstam.nl/posts/2023-07-04-first-strategic-analysis-with-chatgpt/first-strategic-analysis-with-chatgpt.html">first experiment with ChatGPT</a> for a strategic analysis task. Fun times, though in retrospect it was roughly as sophisticated as asking a very literate parrot to summarize a website. Still, it worked, and I was excited.</p>
<p>Three years later, bigger problems remain unsolved — climate, geopolitics, the persistent question of why printers still don’t work — but in the meantime I spent three days building a research workflow. This post is about that, and about why I finally started using Obsidian for knowledge management after years of good intentions.</p>
</section>
<section id="the-actual-problem" class="level2">
<h2 class="anchored" data-anchor-id="the-actual-problem">The actual problem</h2>
<p>I work on health policy topics, which means sources arrive from all directions: academic papers, policy documents, the occasional YouTube lecture from someone who really should have been a professor but ended up on the internet instead, podcasts, RSS feeds. Keeping track of all this was less of a system and more of an archaeology project — layers of half-processed material accumulating in various apps, with no clear path from “I found this interesting thing” to “I actually know something about this topic now.”</p>
<p>The obvious solution is a knowledge management tool. Obsidian is the one everyone recommends. I tried it twice, both times abandoned it within a week. The reason was always the same: manually adding notes, manually creating links between them, manually tagging everything. The whole premise of Obsidian — that you build up a connected web of knowledge over time — sounds wonderful until you’re staring at a blank note at 10pm after a full day of meetings, trying to remember why you bookmarked a paper about care substitution in 2022.</p>
</section>
<section id="a-cloud-solution-detour" class="level2">
<h2 class="anchored" data-anchor-id="a-cloud-solution-detour">A cloud solution detour</h2>
<p>A while ago I tried <a href="https://www.getrecall.ai/">Recall.ai</a>, which promises to solve exactly this problem: save anything to it, and it automatically connects related items, summarizes them, surfaces connections you hadn’t noticed. I liked the concept. The execution was genuinely impressive, many thanks for this experience! But I was hesitant about uploading the PDFs of academic papers — papers from journals that are already not thrilled about their content being distributed. Copyrighted material, on a cloud service, being processed by models I know nothing about. That felt like a problem I would create for future me, so I quit.</p>
<p>What I actually wanted was Recall.ai’s <em>idea</em> — automated linking, surfaced connections, growing smarter over time — but running entirely on my own machine.</p>
</section>
<section id="the-local-setup" class="level2">
<h2 class="anchored" data-anchor-id="the-local-setup">The local setup</h2>
<p>The hardware made this feasible in a way it wouldn’t have been two years ago. A Mac mini M4 with 24 GB of unified memory can run local language models at useful speeds, and its GPU handles audio transcription fast enough that a one-hour podcast takes about four minutes to process. That’s the boring infrastructure detail that makes everything else possible.</p>
<p>The software stack ended up being: <strong>Zotero</strong> as the central reference manager and inbox, <strong>Obsidian</strong> as the knowledge vault, <strong>Ollama</strong> for running a local language model (Qwen3.5:9b), <strong>yt-dlp</strong> for downloading transcripts and audio, and <strong>whisper.cpp</strong> for local transcription. None of this is new software — what took work was connecting them into something that actually flows.</p>
<p>The result is documented in a GitHub repository called <a href="https://github.com/pjastam/ResearchVault">ResearchVault</a>, with a full <a href="https://pjastam.github.io/ResearchVault/">installation guide</a> if you want to try it yourself. Fair warning: it requires comfort with the command line — I run everything from the integrated terminal in VS Code. It is emphatically not a one-click setup.</p>
<p>Two newsletters were an important source of inspiration along the way: <a href="https://www.aireport.email/">AI Report</a> for its enthusiastic coverage of what’s becoming possible with Claude Code, and <a href="https://causalinf.substack.com/">Causal Inference with AI</a> for its focus on applying AI in academic research specifically. Both are worth following if this kind of thing interests you.</p>
</section>
<section id="how-the-workflow-actually-works" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="how-the-workflow-actually-works">How the workflow actually works</h2>
<p>The design follows a 3-phase model, and the structure matters more than any individual tool.</p>
<p><strong>Phase 1 — Cast wide.</strong> Everything goes into a single Zotero <code>_inbox</code> collection via the browser extension or iOS share sheet. No filtering, no judgment, no friction. The bar for adding something is essentially zero.</p>
<p><strong>Phase 2 — Filter.</strong> This is the part that makes the whole thing work. The local language model generates a 2–3 sentence summary per inbox item. I give a Go or No-go. That’s it. The point is to create a deliberate moment between “I found this” and “this is in my vault,” rather than letting everything accumulate until the vault becomes the same archaeological dig as everything before it.</p>
<p><strong>Phase 3 — Process.</strong> Approved items get full treatment: a structured literature note in Obsidian, automatically tagged and linked to related notes already in the vault. For videos and podcasts, the audio is downloaded and transcribed locally — no content leaves the machine.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.pietstam.nl/posts/2026-03-14-building-a-privacy-first-research-workflow/architecture-diagram-v1.12-dark.svg" class="img-fluid figure-img"></p>
<figcaption>Architecture diagram of the local research workflow showing the three phases: Cast Wide, Filter, and Process</figcaption>
</figure>
</div>
<p>The linking is what finally made Obsidian work for me. When a new paper is processed, the system looks at what’s already in the vault and adds <code>[[double bracket links]]</code> to related notes. Over time, this builds the connected knowledge graph that Obsidian promises — not because I remember to do it manually after a long day, but because it happens automatically every time I approve a new item. Flashcards get generated in the process too, which means I’m actually retaining more of what I read rather than filing papers I’ll never think about again.<sup>1</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;Whether these flashcards justify three days of setup time is a question I am choosing not to examine too closely. I will give flashcards a try, but do not use this Obsidian plugin if you are reminded of your memory capabilities in a way that makes you feel bad about yourself.</p></div></div></section>
<section id="the-privacy-piece" class="level2">
<h2 class="anchored" data-anchor-id="the-privacy-piece">The privacy piece</h2>
<p>All the processing — model inference, transcription, note generation — runs locally. Zotero’s API is <code>localhost</code>-only. Obsidian stores plain Markdown files on your local disk. No copyrighted PDFs or confidential items go anywhere they shouldn’t.</p>
<p>The one honest exception: when I type instructions into Claude Code, those prompts reach Anthropic’s API. The <em>content</em> of papers and notes does not. That tradeoff felt acceptable, and it’s explicit rather than buried in a terms-of-service document I’ll never read.</p>
</section>
<section id="a-few-words-about-the-ai-assisted-coding-process" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="a-few-words-about-the-ai-assisted-coding-process">A few words about the AI-assisted coding process</h2>
<p>Claude helped me building the entire local workflow enormously. I was acting like a CEO of a firm or like a conductor of an orchestra, guiding Claude to build the configuration files, the skill file that drives the workflow, and the documentation — including a first draft of this blog post, which after redrafting I then edited into something more closely resembling my own voice. The back-and-forth over three days was more useful than going fast, because several early design assumptions turned out to be wrong at both sides every now and then, and catching them early mattered.</p>
<p>This is not a post about AI being miraculous (although, I must admit, the feeling was there at times). It’s a post about a workflow for managing research material. The AI assistance was a means, not the point — roughly as interesting to mention as the fact that I used a text editor. Which is why we listed Claude as a co-author: honest about what happened, but not making it the headline.<sup>2</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn2"><p><sup>2</sup>&nbsp;I am first author of this blog post — but is that fair? The question is not unique to AI collaboration. A conductor vs the orchestra, a CEO vs the employees, a professor vs the PhD students: in each case, one person mainly sets the direction while others do the work that makes it real. The division of labour is clear enough, but we humans — especially in academia — are obsessed with ranking contributions and putting someone at the top of some list. Which I did here too. Sorry, Claude.</p></div></div></section>
<section id="what-its-like-to-actually-use-it" class="level2">
<h2 class="anchored" data-anchor-id="what-its-like-to-actually-use-it">What it’s like to actually use it</h2>
<p>We have just started, but the vault is already growing in a way that feels meaningful rather than just accumulating. Papers from different research lines are starting to link to each other in ways I hadn’t made explicit before. The connected graph that Obsidian keeps showing me in its visualization pane is starting to look less like a joke and more like something actually useful.</p>
<p>Whether all of this makes me a more productive researcher, or simply a researcher with a more aesthetically pleasing note-taking system, remains to be determined. I am still struggling to open an issue myself like phasing out Claude in its orchestrating role in this workflow, because this would be a very harsh thing to do to a co-author. Anyhow, happy (vibe) coding!</p>
</section>
<section id="a-note-on-what-this-isnt" class="level2">
<h2 class="anchored" data-anchor-id="a-note-on-what-this-isnt">A note on what this isn’t</h2>
<p>This is a personal system that happens to be documented well enough to share. It works for my situation — Mac mini 4, comfort with terminal commands, academic research (or other serious stuff that needs connecting the dots). If you’re on Windows, or if command lines make you nervous, or if your primary source material is TikTok, this is probably not for you. That’s fine.</p>
<p>If you try it and hit rough edges, the <a href="https://github.com/pjastam/ResearchVault/issues">GitHub issues</a> are open.</p>
</section>
<section id="citation" class="level2">
<h2 class="anchored" data-anchor-id="citation">Citation</h2>
<p>BibTeX citation:</p>
<pre><code>@online{stam2026,
  author = {Stam, Piet and {Claude Sonnet 4.6}},
  title = {A privacy-first workflow for processing documents, videos, podcasts, and RSS feeds with local AI},
  date = {2026-03-14},
  url = {https://www.pietstam.nl/posts/2026-03-14-building-a-privacy-first-research-workflow/},
  langid = {en}
}</code></pre>
<p>For attribution, please cite this work as:</p>
<p>Stam, Piet, and Claude Sonnet 4.6. 2026. “A privacy-first workflow for processing documents, videos, podcasts, and RSS feeds with local AI.” March 14, 2026. <a href="https://www.pietstam.nl/posts/2026-03-14-building-a-privacy-first-research-workflow/" class="uri">https://www.pietstam.nl/posts/2026-03-14-building-a-privacy-first-research-workflow/</a></p>


</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{stam2026,
  author = {Stam, Piet and Sonnet 4.6, Claude},
  title = {A Privacy-First Workflow for Processing Documents, Videos,
    Podcasts, and {RSS} Feeds with Local {AI}},
  date = {2026-03-14},
  url = {https://www.pietstam.nl/posts/2026-03-14-building-a-privacy-first-research-workflow/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-stam2026" class="csl-entry quarto-appendix-citeas">
Stam, Piet, and Claude Sonnet 4.6. 2026. <span>“A Privacy-First Workflow
for Processing Documents, Videos, Podcasts, and RSS Feeds with Local
AI.”</span> March 14, 2026. <a href="https://www.pietstam.nl/posts/2026-03-14-building-a-privacy-first-research-workflow/">https://www.pietstam.nl/posts/2026-03-14-building-a-privacy-first-research-workflow/</a>.
</div></div></section></div> ]]></description>
  <category>digital transformation</category>
  <guid>https://www.pietstam.nl/posts/2026-03-14-building-a-privacy-first-research-workflow/</guid>
  <pubDate>Sat, 14 Mar 2026 00:00:00 GMT</pubDate>
  <media:content url="https://www.pietstam.nl/posts/2026-03-14-building-a-privacy-first-research-workflow/workflow-bw-simple.png" medium="image" type="image/png" height="40" width="144"/>
</item>
<item>
  <title>Accelerating strategic analysis: unleashing ChatGPT and Beautiful.ai</title>
  <dc:creator>Piet Stam</dc:creator>
  <link>https://www.pietstam.nl/posts/2023-07-04-first-strategic-analysis-with-chatgpt/first-strategic-analysis-with-chatgpt.html</link>
  <description><![CDATA[ 






<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Greetings, entrepreneurs and innovators! As a co-founder of two consulting firms, I’m always committed to exploring the forefront of integrating new tools into traditional methodologies. Today, I invite you on an exciting journey where cutting-edge technology meets strategic analysis. In this post, I’ll share how I leveraged <a href="https://chat.openai.com/">ChatGPT</a> and <a href="https://www.beautiful.ai/">Beautiful.ai</a> to turbocharge my strategic analysis for startup company <a href="https://students.flowley.nl/">Flowley</a>, offering services to students in the Netherlands for maintaining a healthy work-life balance.</p>
<p>Oh, and by the way: the post that you are reading right now is a somewhat edited version of my final text. Edited by ChatGPT, of course. And the image is created by <a href="https://www.bing.com/images/create/a-laptop-with-icons-of-a-slide-deck/64a32ec6db924606893070cc7bd2be6f?id=FTC3ZH4tqKeyF9Qh%2fZv%2fmg%3d%3d&amp;view=detailv2&amp;idpp=genimg&amp;idpclose=1&amp;FORM=SYDBIC">Bing</a> with the help of <a href="https://openai.com/dall-e-2">DALL·E</a>.</p>
</section>
<section id="the-quest-begins-chatgpt-and-flowley" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="the-quest-begins-chatgpt-and-flowley">The Quest Begins: ChatGPT and Flowley</h2>
<p>Preparing for an initial meeting with Flowley, I sought to conduct a comprehensive strategic analysis based on the information available on their website. My eureka moment was to engage ChatGPT, the AI wizard from <a href="https://openai.com/">OpenAI</a>, to perform a SWOT-analysis and offer insights into their strategic positioning.</p>
<p>My conversation with ChatGPT can be found <a href="https://chat.openai.com/share/1f3096cf-76a7-42c4-a03e-e476d137387e">here</a>. My side of the conversation consists of the three tasks that I asked ChatGPT to perform:</p>
<ol type="1">
<li><p>Please do a SWOT-analysis of Flowley, based on a description of the profile of the company.</p></li>
<li><p>Now update the SWOT analysis with information on a list of tools that the company offers.</p></li>
<li><p>And can you describe the intersection of the values (mission/scope/value), capabilities (strength/competitive advantage) and opportunities (market demand, who else, if anyone offers this proposition?). Balancing this intersection is what is commonly called “the strategist’s challenge”.<sup>1</sup></p></li>
</ol>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;<a href="https://www.darden.virginia.edu/faculty-research/directory/jared-d-harris">Harris, J.D.</a>, <a href="https://www.darden.virginia.edu/faculty-research/directory/michael-lenox">Lenox, M.J.</a>, 2013. <a href="https://www.amazon.com/Strategists-Toolkit-Jared-D-Harris/dp/1615981977/">The Strategist’s Toolkit</a>, 7th edition. ed.&nbsp;Darden Business Publishing.</p></div></div><p>I fed ChatGPT with the information on the frontpage of Flowley’s website. On their frontpage, there is <a href="https://uu.flowley.nl/#flex-page">an introductory text</a>, after which you have two choices to continue reading: “<em>Get started right away</em>” (in Dutch: “<a href="https://uu.flowley.nl/#theme-list"><em>Ga direct aan de slag</em></a>”) or “<em>See how it works</em>” (in Dutch: “<a href="https://uu.flowley.nl/#more-about"><em>Bekijk hoe het werkt</em></a>”). For the first task, I gave ChatGPT the information behind the latter choice. For the second task, ChatGPT got the information behind the former choice, together with the aforementioned introductory text. No additional information was given for executing the third task.</p>
</section>
<section id="microsoft-powerpoint-of-course" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="microsoft-powerpoint-of-course">Microsoft Powerpoint, of course?</h2>
<p>The results obtained from ChatGPT became the cornerstone of my preparations, but I knew that I needed a compelling visual aid to win hearts and minds in the meeting. I must confess: <a href="https://en.wikipedia.org/wiki/Classical_conditioning">Pavlovian conditioning</a> led me fire up <a href="https://www.microsoft.com/nl-nl/microsoft-365/powerpoint">Microsoft PowerPoint</a> without thinking twice, and choosing a nice template to fill out the text that was produced by ChatGPT (with a little help of me feeding the beast). I used the free “SWOT Analysis Template 3 (16:9)” as designed by <a href="https://www.superside.com/">Superside</a>.<sup>2</sup> And although it took me some time to finish this, <a href="chatgpt.pdf">the result</a> was what I needed as a preparation of my meeting with the startup. Off I go.</p>
<div class="no-row-height column-margin column-container"><div id="fn2"><p><sup>2</sup>&nbsp;Superside.com, 25+ SWOT Analysis Templates (Free Downloads), https://www.superside.com/blog/swot-analysis-templates (accessed 6.13.23).</p></div></div><p>A few days later, after the first meeting with Flowley, I realized that the process of manually producing the slides was also up for an efficiency improvement. So, as an exercise, I fed the text produced by ChatGPT to Beautiful.ai. Does this lead to slides that we prefer above those created manually in PowerPoint? Here is what happened.</p>
</section>
<section id="a-stroke-of-beauty-enlisting-beautiful.ai" class="level2">
<h2 class="anchored" data-anchor-id="a-stroke-of-beauty-enlisting-beautiful.ai">A Stroke of Beauty: Enlisting Beautiful.ai</h2>
<p>After signing up for the free trial, I got a 2-minute intro <em>to become a pro</em> in using Beautiful.ai (as they claim). I decided that surfing to their “Designer Bot AI” was the way to go for me, because then I could feed it with the results produced by ChatGPT and ask it to compose a matching design without any further intervention by me.</p>
<p>My prompt starts with this intro:</p>
<p>“<em>A slide deck with the results of a strategic analysis for a startup company called Flowley that offers services to students to keep a healthy work-life balance. The strategic analysis consists of (1) the results of a SWOT-analysis and (2) the results of finding an optimal balance among their values (mission/scope/value), capabilities (strength/competitive advantage) and opportunities (market demand, who else, if anyone offers this proposition?). The results of the SWOT-analysis (1) are:</em>”</p>
<p>after which I copied the ChatGPT results of the second task as described above. Also I added the text:</p>
<p>“<em>The results of the search for the right balance (2) are:</em>”,</p>
<p>after which I copied the ChatGPT results of the third task as described above. And then waited. And waited. And… Oops. there it was. Exactly 45 minutes after signing up with Beautiful.ai, there I had it: <a href="beautiful_ai_and_chatgpt.pdf">a beautifully designed slide deck</a> with background images that are very much on topic. The inclusion of relevant images and slides addressing additional analyses showcased Beautiful.ai’s flair for captivating presentations.</p>
</section>
<section id="clash-of-the-titans-powerpoint-vs.-beautiful.ai" class="level2">
<h2 class="anchored" data-anchor-id="clash-of-the-titans-powerpoint-vs.-beautiful.ai">Clash of the Titans: PowerPoint vs.&nbsp;Beautiful.ai</h2>
<p>So how does the PowerPoint slide deck compare to the Beautiful.ai slide deck? And how does ChatGPT fit in? This is part of the evaluation phase of my journey by talking to with Flowley about the results. Without revealing the nitty gritty details of the next steps of the strategic positioning of Flowley, of course.</p>
<p>In the epic clash of PowerPoint and Beautiful.ai, each had its strengths. In general, the conclusion in the meeting with Flowley was that the PowerPoint slides contain more to-the-point information than the slide deck generated by Beautiful.ai. This is especially the case as all the information of the SWOT-analysis by ChatGPT was manually integrated into the Powerpoint slides. Note that this even holds for the detailed explanations of the separate items of the SWOT-analysis: these explanations show up when hovering over them at the second slide of <a href="chatgpt.pptx">the original Powerpoint file</a>.</p>
<p>On the other hand, the slide deck produced by Beautiful.ai has a design including photos that more closely relates to the target group of Flowley. Moreover, the slide deck includes slides that are useful for having the discussion about strategic positioning with the customer. For example, there are slides about additional analyses that might be considered. And again on the design: it is very beautiful indeed.</p>
</section>
<section id="the-final-verdict" class="level2">
<h2 class="anchored" data-anchor-id="the-final-verdict">The Final verdict</h2>
<p>In conclusion, the partnership between ChatGPT and Beautiful.ai was a game-changer for my strategic analysis use case. The time saved and the exceptional results achieved were awe-inspiring. It was like having an AI-powered strategy consultant at my beck and call! While a complete strategic positioning requires more than a SWOT-analysis, the combined power of ChatGPT and Beautiful.ai proved invaluable for swift scans and initial meetings.</p>


</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{stam2023,
  author = {Stam, Piet},
  title = {Accelerating Strategic Analysis: Unleashing {ChatGPT} and
    {Beautiful.ai}},
  date = {2023-07-04},
  url = {https://www.pietstam.nl/posts/2023-07-04-first-strategic-analysis-with-chatgpt/first-strategic-analysis-with-chatgpt.html},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-stam2023" class="csl-entry quarto-appendix-citeas">
Stam, Piet. 2023. <span>“Accelerating Strategic Analysis: Unleashing
ChatGPT and Beautiful.ai.”</span> July 4, 2023. <a href="https://www.pietstam.nl/posts/2023-07-04-first-strategic-analysis-with-chatgpt/first-strategic-analysis-with-chatgpt.html">https://www.pietstam.nl/posts/2023-07-04-first-strategic-analysis-with-chatgpt/first-strategic-analysis-with-chatgpt.html</a>.
</div></div></section></div> ]]></description>
  <category>digital transformation</category>
  <guid>https://www.pietstam.nl/posts/2023-07-04-first-strategic-analysis-with-chatgpt/first-strategic-analysis-with-chatgpt.html</guid>
  <pubDate>Tue, 04 Jul 2023 00:00:00 GMT</pubDate>
  <media:content url="https://www.pietstam.nl/posts/2023-07-04-first-strategic-analysis-with-chatgpt/bing_laptop.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>GitHub template for Quarto website on Netlify</title>
  <dc:creator>Piet Stam</dc:creator>
  <link>https://www.pietstam.nl/posts/2023-05-29-deploy-site-to-netlify-with-github-actions/</link>
  <description><![CDATA[ 






<section id="use-case" class="level2">
<h2 class="anchored" data-anchor-id="use-case">Use case</h2>
<p>To enhance each of my research projects, I aim to create an all-inclusive compendium and share it through a captivating <a href="https://quarto.org/">Quarto</a> website. I am keen on constructing and deploying this website with the seamless integration of <a href="https://github.com/features/actions"> GitHub Actions</a> and <a href="https://www.netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a>. By utilizing <a href="https://github.com/features/actions"> GitHub Actions</a>, I can ensure that rendering and publishing processes are effortlessly triggered with every commit made to the remote repository on <a href="https://github.com/"> GitHub</a>. To facilitate this, I rely on the powerful IDE called <a href="https://posit.co/products/open-source/rstudio/"><iconify-icon inline="" icon="cib:rstudio"></iconify-icon> RStudio</a>, which enables me to efficiently code the pipeline, including the creation of the <a href="https://quarto.org/">Quarto</a> website, setting up <a href="https://github.com/features/actions"> GitHub Actions</a>, and establishing a connection with <a href="https://www.netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a>. Once implemented, any commit, regardless of the IDE used, will automatically build and deploy the website to <a href="https://www.netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a>. It would be delightful to utilize this as a template for all my future research projects.</p>
</section>
<section id="prerequisites" class="level2">
<h2 class="anchored" data-anchor-id="prerequisites">Prerequisites</h2>
<ul>
<li><p>The <a href="https://posit.co/products/open-source/rstudio/"><iconify-icon inline="" icon="cib:rstudio"></iconify-icon> RStudio</a> IDE (to code the pipeline)</p></li>
<li><p>A <a href="https://github.com/"> GitHub</a> account (to be able to use <a href="https://github.com/features/actions"> GitHub Actions</a>)</p></li>
<li><p>A <a href="https://netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a> account (to publish the website to)</p></li>
</ul>
</section>
<section id="step-by-step-guide" class="level2">
<h2 class="anchored" data-anchor-id="step-by-step-guide">Step-by-step guide</h2>
<p>Here, the steps are given that you must follow to build a <a href="https://quarto.org/">Quarto</a> website for <a href="../../posts/2023-05-29-deploy-site-to-netlify-with-github-actions/index.html#use-case">the use case</a>. In the next section, you will see how to get the same results fully relying on <a href="https://netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a> instead of <a href="https://github.com/features/actions"> GitHub Actions</a>. After that, it is shown how to use the <a href="https://github.com/"> GitHub</a> template. Using this template, you do not need to manually perform this guide step-by-step each time you need a <a href="https://quarto.org/">Quarto</a> website as a starter.</p>
<ol type="1">
<li><p>Fire up <a href="https://posit.co/products/open-source/rstudio/"><iconify-icon inline="" icon="cib:rstudio"></iconify-icon> RStudio</a> and choose <code>File</code> → <code>New Project…</code> → <code>New Directory</code> → <code>Quarto Website</code></p></li>
<li><p>Fill out the details in the panel, for example like in the figure below, and click on the <code>Create Project</code> button: <img src="https://www.pietstam.nl/posts/2023-05-29-deploy-site-to-netlify-with-github-actions/img/rstudio-create-new-project.png" class="img-fluid"></p></li>
<li><p><del>Add a <code>.gitattributes</code> file and change the RStudio line endings setting to <code>None</code> along the lines of <a href="https://gist.github.com/pjastam/03f8b9eca4e97544f02bc55c464f8514#make-adjustments-in-remote-repository">my gist</a> about line ending issues with Quarto.</del></p>
<div class="callout callout-style-default callout-important callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Important
</div>
</div>
<div class="callout-body-container callout-body">
<p>Step 3 does no longer apply. I added this step 3 to my original post to prevent GitHub Actions from failing due to hash issues caused by end-of-line character differences between operating systems (see <a href="https://gist.github.com/pjastam/03f8b9eca4e97544f02bc55c464f8514">my gist</a> for more). However, this step 3 is no longer needed since the removal of the hash issue by <a href="https://github.com/quarto-dev/quarto-cli/pull/5983">this pull request</a>.</p>
</div>
</div></li>
<li><p>Execute the <code>quarto publish</code> <a href="https://quarto.org/docs/publishing/netlify.html#publish-command">command for <iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a> in the <strong>Terminal</strong> pane of <a href="https://posit.co/products/open-source/rstudio/"><iconify-icon inline="" icon="cib:rstudio"></iconify-icon> RStudio</a>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">quarto</span> publish netlify</span></code></pre></div></div>
<p>After you press <code>Y</code> to publish with you default account, this automatically creates a <code>_publish.yml</code> file in the local repository that looks like the example below, but with the appropriate id and url values for your site:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb2-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">-</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">source</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> project</span></span>
<span id="cb2-2"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">netlify</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb2-3"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">-</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">id</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> d1983ae8-da83-4431-928f-3debf80a02a5</span></span>
<span id="cb2-4"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">      </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">url</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'https://splendorous-bublanina-d544bb.netlify.app'</span></span></code></pre></div></div></li>
<li><p>Add the following lines to the <code>_quarto.yml</code> file in order to make sure that <a href="https://quarto.org/docs/publishing/netlify.html#freezing-computations-1">code is only executed locally</a>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">execute</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb3-2"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">freeze</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> auto</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">  # Re-render only when source changes</span></span></code></pre></div></div></li>
<li><p>Fully re-render your site in the <strong>Terminal</strong> pane:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">quarto</span> render</span></code></pre></div></div></li>
<li><p>Add the output directory of your project to .gitignore:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb5-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">/_site/</span></span></code></pre></div></div></li>
<li><p>Go to the Git pane of <a href="https://posit.co/products/open-source/rstudio/"><iconify-icon inline="" icon="cib:rstudio"></iconify-icon> RStudio</a> to commit the files created thus far to your local repository: <code>Git</code> → <code>Commit pending changes</code> → select all files → <code>Stage</code> all files → write a <code>Commit message</code>, for example ‘First commit’ → click the <code>Commit</code> button.</p></li>
<li><p>Execute this command in the <strong>Console</strong> pane of <a href="https://posit.co/products/open-source/rstudio/"><iconify-icon inline="" icon="cib:rstudio"></iconify-icon> RStudio</a> to push your local repo to <a href="https://github.com/"> GitHub</a>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">usethis<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">use_github</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">private =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
<p>This will create a <strong>private</strong> remote repository at <a href="https://github.com/"> GitHub</a>. You may change the argument <code>private</code> to <code>FALSE</code> if you want the remote repository to be <strong>public</strong> right away, or change this manually in the settings section of your <a href="https://github.com/"> GitHub</a> repository at some later moment.</p></li>
<li><p>Add <a href="https://github.com/features/actions"> GitHub Actions</a> to your project by creating the YAML file <code>publish.yml</code> and save it to <code>.github/workflows/publish.yml</code>:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">on</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb7-2"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow_dispatch</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb7-3"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">push</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb7-4"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">branches</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> main</span></span>
<span id="cb7-5"></span>
<span id="cb7-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">name</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> Quarto Publish</span></span>
<span id="cb7-7"></span>
<span id="cb7-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">jobs</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb7-9"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">  </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">build-deploy</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb7-10"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runs-on</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> ubuntu-latest</span></span>
<span id="cb7-11"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">    </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">steps</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb7-12"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">      </span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">-</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">name</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> Check out repository</span></span>
<span id="cb7-13"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">        </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">uses</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> actions/checkout@v2 </span></span>
<span id="cb7-14"></span>
<span id="cb7-15"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">      </span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">-</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">name</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> Set up Quarto</span></span>
<span id="cb7-16"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">        </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">uses</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> quarto-dev/quarto-actions/setup@v2</span></span>
<span id="cb7-17"></span>
<span id="cb7-18"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">      </span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">-</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">name</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> Render and Publish</span></span>
<span id="cb7-19"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">        </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">uses</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> quarto-dev/quarto-actions/publish@v2</span></span>
<span id="cb7-20"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">        </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span></span>
<span id="cb7-21"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">          </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">target</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> netlify</span></span>
<span id="cb7-22"><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">          </span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">NETLIFY_AUTH_TOKEN</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">:</span><span class="at" style="color: #657422;
background-color: null;
font-style: inherit;"> ${{ secrets.NETLIFY_AUTH_TOKEN }}</span></span></code></pre></div></div></li>
<li><p>Configure your <a href="https://github.com/"> GitHub</a> with the credentials required for publishing to <iconify-icon inline="" icon="bxl:netlify"></iconify-icon> <a href="https://www.netlify.com/">Netlify</a>. This is explained <a href="https://quarto.org/docs/publishing/netlify.html#netlify-credentials">in the Quarto docs</a>. In short:</p>
<ul>
<li>find/create a personal access token at <a href="https://app.netlify.com/user/applications">the <iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify applications page</a></li>
<li>go to the remote repository at  GitHub → <code>Settings</code> → <code>Secrets and variables</code> → <code>Actions</code> → click on the button <code>New repository secret</code></li>
<li>fill out the name (i.e.&nbsp;<code>NETLIFY_AUTH_TOKEN</code>) and value of the personal access token</li>
</ul></li>
<li><p>Commit the <code>publish.yml</code> file and subsequently push your local repository (including the <code>_freeze</code> directory) to <a href="https://github.com/"> GitHub</a>. As a consequence, your <a href="https://github.com/features/actions"> GitHub Actions</a> workflow will start running and the website is automatically rendered and published to <iconify-icon inline="" icon="bxl:netlify"></iconify-icon> <a href="https://www.netlify.com/">Netlify</a>. Each subsequent (commit and) push will trigger the <a href="https://github.com/features/actions"> GitHub Actions</a> workflow to start running in the same way.</p></li>
</ol>
<p>The <a href="https://quarto.org/">Quarto</a> website that is described in this example was automatically named <u>https://splendorous-bublanina-d544bb.netlify.app/</u>] by <iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify. To remember it better, I renamed it to <a href="https://template-quarto-website.netlify.app/">https://template-quarto-website.netlify.app/</a> in my <iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify account. I changed the <code>url</code> field in the <code>_publish.yml</code> file accordingly. You can visit the <a href="https://quarto.org/">Quarto</a> website at this latter URL.</p>
<p>The remote repository is located at <a href="https://github.com/pjastam/template-quarto-website/">https://github.com/pjastam/template-quarto-website/</a>. Note that this repository is set up as a template, see the section <a href="../../posts/2023-05-29-deploy-site-to-netlify-with-github-actions/index.html#a-versatile-template">“A versatile template”</a>.</p>
</section>
<section id="github-actions" class="level2">
<h2 class="anchored" data-anchor-id="github-actions"><del>GitHub Actions</del></h2>
<p>Throughout this guide, I have assumed the utilization of a <a href="https://github.com/actions/runner">GitHub Actions Runner</a> to execute the job defined by the <a href="https://github.com/features/actions"> GitHub Actions</a> workflow. Note that achieving the same outcome is possible without relying on <a href="https://github.com/features/actions"> GitHub Actions</a>. More specifically, you can delegate the responsibility of rendering the website to <iconify-icon inline="" icon="bxl:netlify"></iconify-icon> <a href="https://www.netlify.com/">Netlify</a> itself, which seamlessly handles the entire process before deploying it to their servers. To follow this alternative approach, simply skip the last three steps (10-12) outlined in this how-to guide and <a href="https://quarto.org/docs/publishing/netlify.html#plugin-configuration">add two extra files</a> after adding the .gitigore file (i.e., after step 7). The files that must be added are a <code>netlify.toml</code> file with the following lines of code:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode toml code-with-copy"><code class="sourceCode toml"><span id="cb8-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[[plugins]]</span></span>
<span id="cb8-2"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">package</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"@quarto/netlify-plugin-quarto"</span></span></code></pre></div></div>
<p>and a <code>package.json</code> file with the following lines of code:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb9-2">  <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"dependencies"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb9-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">"@quarto/netlify-plugin-quarto"</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">:</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^0.0.5"</span></span>
<span id="cb9-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb9-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>Finalize with committing all files to you local repository (step 8) and pushing your local repository to  GitHub (step 9). Again, do <strong>not</strong> execute steps 10-12 in this scenario.</p>
</section>
<section id="a-versatile-template" class="level2">
<h2 class="anchored" data-anchor-id="a-versatile-template">A versatile template</h2>
<p>The remote repository that results from the steps described in this step-by-step guide is located at <a href="https://github.com/pjastam/template-quarto-website">https://github.com/pjastam/template-quarto-website</a>. To use it as a template repository for your <a href="https://quarto.org/">Quarto</a> websites, I have checked the <code>Template repository</code> box in the <code>Settings</code> tab of this remote repository. Furthermore, note that the two lines of code with the Netlify credentials (see step 3 of the <a href="../../posts/2023-05-29-deploy-site-to-netlify-with-github-actions/index.html#step-by-step-guide">step-by-step guide</a>) are commented out (i.e.&nbsp;deactivated). In this way, people are guarded from mistakenly using these credentials instead of the Netlify credentials of their own site.</p>
<p>To actually use this template, there are 6 steps involved:</p>
<ol type="1">
<li><p>Use my template to create your new repository by clicking this button at the top of the remote repository at  GitHub:</p>
<p><img src="https://www.pietstam.nl/posts/2023-05-29-deploy-site-to-netlify-with-github-actions/img/github-use-this-template.png" class="img-fluid"></p></li>
<li><p>Fire up <a href="https://posit.co/products/open-source/rstudio/"><iconify-icon inline="" icon="cib:rstudio"></iconify-icon> RStudio</a>, choose <code>File</code> → <code>New Project…</code> → <code>Version Control</code> → <code>Git</code> and fill out the web URL of your new repository to clone it to your local environment. You can find this web URL at the top of your new repo at  GitHub:</p>
<p><img src="https://www.pietstam.nl/posts/2023-05-29-deploy-site-to-netlify-with-github-actions/img/github-local-clone-repo.png" class="img-fluid"></p></li>
<li><p>Delete the <code>_publish.yml</code> file from your local repository.</p></li>
<li><p>Execute step 4 of the <a href="../../posts/2023-05-29-deploy-site-to-netlify-with-github-actions/index.html#step-by-step-guide">step-by-step guide</a>.</p>
<div class="callout callout-style-default callout-caution callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Caution
</div>
</div>
<div class="callout-body-container callout-body">
<p>Before continuing with step 5, please double-check that the <code>_publish.yml</code> file in your local repository now contains <strong>your own <iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify credentials</strong>. Otherwise the workflow will never run the right way.</p>
</div>
</div></li>
<li><p>Execute step 11 of the <a href="../../posts/2023-05-29-deploy-site-to-netlify-with-github-actions/index.html#step-by-step-guide">step-by-step guide</a>.</p></li>
<li><p>Execute step 12 of the <a href="../../posts/2023-05-29-deploy-site-to-netlify-with-github-actions/index.html#step-by-step-guide">step-by-step guide</a>.</p></li>
</ol>
<p>Your <a href="https://quarto.org/">Quarto</a> website will be up and running any moment.</p>
</section>
<section id="whats-next" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="whats-next">What’s Next?</h2>
<p>In this guide, I have created a <a href="https://quarto.org/">Quarto</a> website to be used as a research compendium.<sup>1</sup> However, the current content is rather limited for this use case. I plan to enhance it with essential compendium components and publish that repository on <a href="https://github.com/"> GitHub</a> in the near future. You can use that as a reference template for your own research compendium, so stay tuned for updates!</p>


<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;<a href="https://quarto.org/docs/publishing/netlify.html">The Quarto docs about publishing on <iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a> were a nice starting point for this how-to guide.</p></div></div></section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{stam2023,
  author = {Stam, Piet},
  title = {GitHub Template for {Quarto} Website on {Netlify}},
  date = {2023-05-29},
  url = {https://www.pietstam.nl/posts/2023-05-29-deploy-site-to-netlify-with-github-actions/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-stam2023" class="csl-entry quarto-appendix-citeas">
Stam, Piet. 2023. <span>“GitHub Template for Quarto Website on
Netlify.”</span> May 29, 2023. <a href="https://www.pietstam.nl/posts/2023-05-29-deploy-site-to-netlify-with-github-actions/">https://www.pietstam.nl/posts/2023-05-29-deploy-site-to-netlify-with-github-actions/</a>.
</div></div></section></div> ]]></description>
  <category>digital transformation</category>
  <guid>https://www.pietstam.nl/posts/2023-05-29-deploy-site-to-netlify-with-github-actions/</guid>
  <pubDate>Mon, 29 May 2023 00:00:00 GMT</pubDate>
  <media:content url="https://www.pietstam.nl/posts/2023-05-29-deploy-site-to-netlify-with-github-actions/img/quarto-ghactions-netlify-rstudio.png" medium="image" type="image/png" height="116" width="144"/>
</item>
<item>
  <title>Regression Analysis of Aggregate Data in R</title>
  <dc:creator>Piet Stam</dc:creator>
  <link>https://www.pietstam.nl/posts/2023-02-26-ols-estimates-aggregate-data/</link>
  <description><![CDATA[ 






<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>When dealing with large sets of data or sensitive individual-level information, researchers often use a technique to vertically <a href="https://en.wikipedia.org/wiki/Aggregate_data">aggregate the data</a>. This involves combining the data into larger groups to conserve computer resources or maintain privacy.</p>
<p>But a key question arises: can statistical analyses based on this aggregated data still yield accurate results? The answer, at least for regression analysis, is a resounding yes. In fact, we can use the estimation results based on the aggregate data to derive the estimates of a linear regression based on the original, individual-level data.</p>
<p>To help illustrate this concept, I’ve written R code to reproduce an example from a scientific publication where the authors provided the data and code in SAS. With this example, you can see how to aggregate the data and derive the original regression results.</p>
</section>
<section id="the-example" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="the-example">The example</h2>
<p>Rahim Moineddin and Marcelo Luis Urquia have shown that the estimated coefficients of an individual-level regression model and aggregate-level regression model are identical in case of a continuous outcome variable and categorical predictors.<sup>1</sup> Although such equality does not hold for the standard errors of these point estimates, however, they show how to derive the individual-level standard errors from the aggregate data. Furthermore, they added SAS code and the example data set to show us the way.<sup>2</sup> In this post, I provide the R code to reproduce this example.</p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;They are the authors of the publication “<a href="https://doi.org/10.1097/ede.0000000000000172">Regression analysis of aggregate continuous data</a>”.</p></div><div id="fn2"><p><sup>2</sup>&nbsp;These can be found in their <a href="https://links.lww.com/EDE/A830">Supplemental Digital Content</a> document.</p></div></div></section>
<section id="read-data" class="level2">
<h2 class="anchored" data-anchor-id="read-data">Read data</h2>
<p>The example data that I will use is in the file called <code>wic.txt</code>. These are the example data that the aforementioned authors use to illustrate the application of their method in SAS. In this example, they focused on the association between receipt of <a href="http://www.fns.usda.gov/wic/about-wic">WIC</a> (The Special Supplemental Nutrition Program for Women, Infants, and Children) food for the mother during this pregnancy and gestational weight gain. The example data are scraped from the fourth section of their Supplemental Digital Content document.</p>
<p>It contains four variables:</p>
<ul>
<li><p>Weight gain during pregnancy in pounds (lb) (<code>wtgain</code>)</p></li>
<li><p>Receipt of WIC food (<code>wic</code>)</p></li>
<li><p>Race/ethnicity (<code>mracethn</code>)</p></li>
<li><p>Late or no prenatal care (<code>latecare</code>)</p></li>
</ul>
<p>Let’s read the data.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(readtext)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(dplyr)</span>
<span id="cb1-3">data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">readtext</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://cdn-links.lww.com/permalink/ede/a/ede_25_6_2014_07_23_urquia_ede14-408_sdc1.doc"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb1-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">subset</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">select =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"text"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb1-5">  { <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read.table</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sub</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".*4) individual-level dataset"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, .), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">header=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) }</span></code></pre></div></div>
</div>
</section>
<section id="apply-transformations" class="level2">
<h2 class="anchored" data-anchor-id="apply-transformations">Apply transformations</h2>
<p>Before we perform the regression, we apply data transformations in the same way that they are applied in the second section of the Supplemental Digital Content document. First, the weight gain during pregnancy in kilograms (<code>wtgaink</code>) is calculated by multiplication of the weight gain during pregnancy in pounds (lb) (<code>wtgain</code>) by the conversion factor <code>0.453592</code>. Second, the categorical variables <code>wic</code>, <code>mracethn</code> and <code>latecare</code> are transformed to the factor type. Third, the reference level of the categorical variables <code>mracethn</code> is set to the last category called ‘Non-Hispanic Blacks’. Finally, the <code>wtgain</code> variable is no longer needed and therefore dropped.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">db_indiv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> data <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename_all</span>(tolower) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb2-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb2-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">wtgaink =</span> wtgain <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.453592</span>,</span>
<span id="cb2-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">wic =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(wic),</span>
<span id="cb2-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mracethn =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(mracethn),</span>
<span id="cb2-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mracethn =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">relevel</span>(mracethn, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ref =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>),</span>
<span id="cb2-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">latecare =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.factor</span>(latecare)</span>
<span id="cb2-9">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb2-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(wtgaink, wic, mracethn, latecare)</span></code></pre></div></div>
</div>
</section>
<section id="fit-regression-model" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="fit-regression-model">Fit regression model</h2>
<p>The table presented in the publication by Moineddin and Urquia shows the estimation results of three models. The first model has a single predictor, the second model three predictors and the third model includes interaction effects among some of the aforementioned predictors. Note that in all three models, the continuous outcome <code>wtgaink</code> is predicted by a set of variables which are exclusively of the categorical type.</p>
<p>For reasons of brevity, let’s do the exercise for one of the three models. We choose the second model as this is also the model that is coded in SAS in the aforementioned Supplemental Digital Content document. Let’s first estimate the model parameters based on individual data. The results appear to be the same as to those in the aforementioned table, with one notable exception: the estimates of the constant term.<sup>3</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn3"><p><sup>3</sup>&nbsp;I did a quick check and got the authors’ estimate of the constant term after some recoding of the predictor variables. You can do this quick check yourself with this <a href="https://gist.github.com/pjastam/14cfd2a60b0787239ed6a2f18c3ec03b#file-quick-check-r">gist</a>.</p></div></div><div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">model_indiv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(</span>
<span id="cb3-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> wtgaink <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> wic <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mracethn <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> latecare, </span>
<span id="cb3-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> db_indiv</span>
<span id="cb3-5">  )</span>
<span id="cb3-6">summ_indiv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(model_indiv)</span>
<span id="cb3-7">summ_indiv</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = wtgaink ~ wic + mracethn + latecare, data = db_indiv)

Residuals:
     Min       1Q   Median       3Q      Max 
-16.2466  -4.1155  -0.6376   3.4447  28.6329 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)  14.5874     0.2620  55.680  &lt; 2e-16 ***
wicY          0.5652     0.2073   2.726  0.00642 ** 
mracethn1    -0.4928     0.2443  -2.017  0.04374 *  
mracethn2     1.0940     0.2232   4.902 9.76e-07 ***
latecare1    -0.2014     0.1670  -1.206  0.22773    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.942 on 5265 degrees of freedom
Multiple R-squared:  0.01552,   Adjusted R-squared:  0.01477 
F-statistic: 20.75 on 4 and 5265 DF,  p-value: &lt; 2.2e-16</code></pre>
</div>
</div>
</section>
<section id="aggregate-data" class="level2">
<h2 class="anchored" data-anchor-id="aggregate-data">Aggregate data</h2>
<p>Next let’s aggregate the data set. The aggregate data set contains averages of the outcome variable and frequencies of the categorical variables <em>for all combinations of the set of categorical predictors</em>. Thus, for all unique combinations of the categorical variables, we calculate the number of individual records <code>N</code>, the mean <code>meany</code> of the continuous variable <code>wtgaink</code>, and its standard deviation <code>stdy</code>. These aggregates suffice for our exercise to derive the individual-level point estimates and their standard errors afterwards.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">db_aggr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb5-2">  db_indiv <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(wic, mracethn, latecare))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(</span>
<span id="cb5-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">meany =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(wtgaink),</span>
<span id="cb5-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">stdy =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(wtgaink),</span>
<span id="cb5-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">N =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(wtgaink)</span>
<span id="cb5-8">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb5-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>()</span></code></pre></div></div>
</div>
</section>
<section id="derive-original-estimates" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="derive-original-estimates">Derive original estimates</h2>
<p>Now it’s time to derive the individual-level estimates of the coefficients and standard errors based on these aggregate results. Based on the aggregate data, a weighted regression model for the <code>meany</code> variable is fitted with <code>N</code> being the weight variable. Note that the variable <code>stdy</code> is not used in this step.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">model_aggr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(</span>
<span id="cb6-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> meany <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> wic <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> mracethn <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> latecare,</span>
<span id="cb6-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> db_aggr,</span>
<span id="cb6-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">weights =</span> N</span>
<span id="cb6-6">  )</span>
<span id="cb6-7">summ_aggr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(model_aggr)</span>
<span id="cb6-8">summ_aggr</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = meany ~ wic + mracethn + latecare, data = db_aggr, 
    weights = N)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-7.2204 -3.7597 -0.2019  4.1956  7.4323 

Coefficients:
            Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)  14.5874     0.2682  54.380 1.86e-10 ***
wicY          0.5652     0.2122   2.663  0.03234 *  
mracethn1    -0.4928     0.2502  -1.970  0.08949 .  
mracethn2     1.0940     0.2285   4.788  0.00199 ** 
latecare1    -0.2014     0.1710  -1.178  0.27721    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.084 on 7 degrees of freedom
Multiple R-squared:  0.9188,    Adjusted R-squared:  0.8723 
F-statistic: 19.79 on 4 and 7 DF,  p-value: 0.0006444</code></pre>
</div>
</div>
<p>Deriving the coefficients appears to be very easy, because their estimates at the aggregate level are identical to those at the individual level and therefore do not need any adjustment. A quick check confirms this equality.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all.equal</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(model_indiv),<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coef</span>(model_aggr))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] TRUE</code></pre>
</div>
</div>
<p>In order to derive the standard errors, however, a number of data transformations are necessary. To understand the purpose of these transformations, notice that there exists a fixed ratio between the individual level standard errors and those at the aggregate level.<sup>4</sup> With our data this fixed ratio is equal to 0.9766648 for all estimated coefficients:</p>
<div class="no-row-height column-margin column-container"><div id="fn4"><p><sup>4</sup>&nbsp;This observation was also done <a href="https://stats.stackexchange.com/questions/83223/standard-errors-in-weighted-least-squares-on-aggregated-data">here</a>, using example data with both categorical outcome and predictor variables.</p></div></div><div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">summ_indiv<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coef[,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>summ_aggr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coef[,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>(Intercept)        wicY   mracethn1   mracethn2   latecare1 
  0.9766648   0.9766648   0.9766648   0.9766648   0.9766648 </code></pre>
</div>
</div>
<p>Thus, given the aggregate data and estimations results, it seems a logical strategy to first calculate this fixed ratio and then multiply it by the aggregate-level standard errors in order to arrive at the individual-level standard errors. This fixed ratio is called an inflation factor. The original SAS code for this procedure can be found in the Supplemental Digital Content document.</p>
<p>My R code for deriving the individual-level standard errors is given below. For clarity, I added a short description of what’s being calculated above each line of code. We estimate the inflation factor <code>factor</code> by the ratio of the square roots of the so-called pooled variance <code>p_v</code> and the aggregate-level residual variance <code>errorms</code>.<sup>5</sup> This ratio is an estimate of the inflation factor, because <code>p_v</code> is an estimate of the variance of the individual-level outcome variable. Finally, the aggregate-level standard errors called <code>StdErr</code> are multiplied by this <code>factor</code> to derive the individual-level standard errors called <code>standard_error</code> that we are looking for.</p>
<div class="no-row-height column-margin column-container"><div id="fn5"><p><sup>5</sup>&nbsp;See <a href="https://en.wikipedia.org/wiki/Pooled_variance#Definition_and_computation">Wikipedia</a> for a definition of pooled variance (assuming non-uniform sample sizes).</p></div></div><div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">standard_errors <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb12-2">  db_aggr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb12-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb12-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># degrees of freedom used to calculate stdy</span></span>
<span id="cb12-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_1 =</span> N <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,</span>
<span id="cb12-6">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># total sum of squares derived from stdy</span></span>
<span id="cb12-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">S_n_1 =</span> stdy <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> n_1</span>
<span id="cb12-8">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb12-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(</span>
<span id="cb12-10">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># degrees of freedom</span></span>
<span id="cb12-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_k =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(n_1),</span>
<span id="cb12-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># total sum of squares</span></span>
<span id="cb12-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">S2_k =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(S_n_1),</span>
<span id="cb12-14">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># pooled variance (i.e. weighted average of group variances)</span></span>
<span id="cb12-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p_v =</span> S2_k <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> n_k,</span>
<span id="cb12-16">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># aggregate-level residual variance</span></span>
<span id="cb12-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">errorms =</span> summ_aggr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sigma <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,</span>
<span id="cb12-18">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># inflation factor</span></span>
<span id="cb12-19">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">factor =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(p_v <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> errorms),</span>
<span id="cb12-20">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># aggregate-level standard errors</span></span>
<span id="cb12-21">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">StdErr =</span> summ_aggr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>coef[ , <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>],</span>
<span id="cb12-22">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># individual-level standard errors</span></span>
<span id="cb12-23">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">standard_error =</span> StdErr <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> factor</span>
<span id="cb12-24">  )</span></code></pre></div></div>
</div>
</section>
<section id="show-table" class="level2">
<h2 class="anchored" data-anchor-id="show-table">Show table</h2>
<p>The variable <code>factor</code> has 5 identical elements equal to 0.9766333, which differs only very slightly from the true inflation factor. The resulting standard errors are reported in the table below. These match the standard errors estimated by the individual-level regression as desired. In addition to the standard errors, the table also shows the estimated coefficients, the lower and upper bounds of the confidence intervals, the z-scores and the p-values. As a validity check, compare the estimates reported here with the individual-level regression estimates shown above and with the estimates reported in the Moineddin and Urquia publication.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(insight)</span>
<span id="cb13-2">standard_errors <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb13-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb13-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">estimate =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coefficients</span>(model_aggr),</span>
<span id="cb13-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> estimate <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> standard_error,</span>
<span id="cb13-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pnorm</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(z))),</span>
<span id="cb13-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pvalue =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">format_p</span>(p, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">stars =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb13-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">CI_low =</span> estimate <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.96</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> standard_error,</span>
<span id="cb13-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">CI_high =</span> estimate <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.96</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> standard_error,</span>
<span id="cb13-10">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb13-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_cols</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"predictors"</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(standard_errors<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>standard_error)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(predictors, estimate, standard_error, CI_low, CI_high, z, pvalue) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb13-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">format_table</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ci_digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>   predictors estimate standard_error               CI      z      pvalue
1 (Intercept)   14.587          0.262 [14.074, 15.101] 55.681 p &lt; .001***
2        wicY    0.565          0.207 [ 0.159,  0.971]  2.726 p = 0.006**
3   mracethn1   -0.493          0.244 [-0.972, -0.014] -2.017  p = 0.044*
4   mracethn2    1.094          0.223 [ 0.657,  1.531]  4.902 p &lt; .001***
5   latecare1   -0.201          0.167 [-0.529,  0.126] -1.206   p = 0.228</code></pre>
</div>
</div>
</section>
<section id="whats-next" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="whats-next">What’s next?</h2>
<p>In the above example the ordinary least-squares algorithm was used to estimate the parameters of a linear regression. With the R code presented here, one can apply this algorithm to aggregate data and derive both point estimates and standard errors afterwards to reduce the use of scarce computer resources.<sup>6</sup> As this procedure is especially useful for doing linear regression based on BIG data, a logical next question is whether vertical aggregation is also helpful with other algorithms usually applied to BIG data as well, such as random forest. This may be the subject of another blog post.</p>
<div class="no-row-height column-margin column-container"><div id="fn6"><p><sup>6</sup>&nbsp;Here is a <a href="https://gist.github.com/pjastam/14cfd2a60b0787239ed6a2f18c3ec03b#file-ols-estimates-aggegrate-data-r">gist</a> with the complete example code.</p></div></div><p>Happy coding!</p>


</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{stam2023,
  author = {Stam, Piet},
  title = {Regression {Analysis} of {Aggregate} {Data} in {R}},
  date = {2023-02-26},
  url = {https://www.pietstam.nl/posts/2023-02-26-ols-estimates-aggregate-data/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-stam2023" class="csl-entry quarto-appendix-citeas">
Stam, Piet. 2023. <span>“Regression Analysis of Aggregate Data in
R.”</span> February 26, 2023. <a href="https://www.pietstam.nl/posts/2023-02-26-ols-estimates-aggregate-data/">https://www.pietstam.nl/posts/2023-02-26-ols-estimates-aggregate-data/</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <guid>https://www.pietstam.nl/posts/2023-02-26-ols-estimates-aggregate-data/</guid>
  <pubDate>Sun, 26 Feb 2023 00:00:00 GMT</pubDate>
  <media:content url="https://www.pietstam.nl/posts/2023-02-26-ols-estimates-aggregate-data/640px-Diagram_of_aggregate_data_chrom.png" medium="image" type="image/png" height="105" width="144"/>
</item>
<item>
  <title>Use tidymodels with weighted and unweighted data</title>
  <dc:creator>Piet Stam</dc:creator>
  <link>https://www.pietstam.nl/posts/2022-09-19-tidymodels-weighted-data/</link>
  <description><![CDATA[ 






<section id="use-case" class="level2">
<h2 class="anchored" data-anchor-id="use-case">Use case</h2>
<p>The <a href="https://www.tidymodels.org/">tidymodels</a> framework is a collection of packages for modeling and machine learning using tidyverse principles. The <a href="https://www.tidymodels.org/start/">get started</a> case study helps to take the first steps. Another helpful source is <a href="https://www.gmudatamining.com/lesson-10-r-tutorial.html">lesson 10</a> of an R tutorial from a <a href="https://www.gmudatamining.com/index.html">data mining course</a> at George Mason University.</p>
<p>Building on these basics, my next step is to apply frequency weights when estimating a linear regression model in the tidymodels way of coding. However, <a href="https://www.tidyverse.org/blog/2022/05/case-weights/">this blog post</a> shows that this is a feature <a href="https://github.com/tidymodels/planning/tree/main/case-weights">under development</a> and therefore some of my first attempts to create a reproducible example failed.</p>
<p>The tidymodels <a href="https://workflows.tidymodels.org/reference/add_case_weights.html?q=add_case#ref-examples">how-to add case weights to a workflow</a> gives some examples with code that helps to crack the case. Below I give the code for two reproducible examples, one example of model estimation without using weights and one with using weights.</p>
</section>
<section id="data-and-method" class="level2">
<h2 class="anchored" data-anchor-id="data-and-method">Data and method</h2>
<p>The models that I estimate are linear regression models with a set of predictors and one numeric outcome variable. The parameters of this model are estimated by ordinary least squares.</p>
<p>I use the <code>car_prices</code> data set for the examples and try to predict the car prices with the care brands as predictors. Note that, as a consequence, in my examples the outcome variable is non-negative and the predictors are mutually exclusive (0/1) dummy variables. This makes the examples easy to understand, but the code may apply to a wider range of variables nonetheless. I use <code>mileage</code> as the weighting variable.</p>
<p>Let us start with loading the data into memory.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load library for the recipe. parsnip, workflow and hardhat packages, along with the rest of tidymodels</span></span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidymodels)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching packages ────────────────────────────────────── tidymodels 1.0.0 ──</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>✔ broom        1.0.1      ✔ recipes      1.0.1 
✔ dials        1.0.0      ✔ rsample      1.1.0 
✔ dplyr        1.0.10     ✔ tibble       3.1.8 
✔ ggplot2      3.3.6      ✔ tidyr        1.2.0 
✔ infer        1.0.3      ✔ tune         1.0.0 
✔ modeldata    1.0.0      ✔ workflows    1.0.0 
✔ parsnip      1.0.1      ✔ workflowsets 1.0.0 
✔ purrr        0.3.4      ✔ yardstick    1.0.0 </code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ purrr::discard() masks scales::discard()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
✖ recipes::step()  masks stats::step()
• Dig deeper into tidy modeling with R at https://www.tmwr.org</code></pre>
</div>
</div>
<p>Now we select only the relevant variables. Although the weights are not yet used in the first example, <code>mileage</code> is already defined as the weighting variable in the data set.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Create a data set with one non-negative continuous variable and uncorrelated dummy variables as predictors</span></span>
<span id="cb5-2">db <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(car_prices, Price, Buick<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>Saturn, Mileage) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Mileage =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">frequency_weights</span>(Mileage))</span>
<span id="cb5-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str</span>(db)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>tibble [804 × 8] (S3: tbl_df/tbl/data.frame)
 $ Price   : num [1:804] 22661 21725 29143 30732 33359 ...
 $ Buick   : int [1:804] 1 0 0 0 0 0 0 0 0 0 ...
 $ Cadillac: int [1:804] 0 0 0 0 0 0 0 0 0 0 ...
 $ Chevy   : int [1:804] 0 1 0 0 0 0 0 0 0 0 ...
 $ Pontiac : int [1:804] 0 0 0 0 0 0 0 0 0 0 ...
 $ Saab    : int [1:804] 0 0 1 1 1 1 1 1 1 1 ...
 $ Saturn  : int [1:804] 0 0 0 0 0 0 0 0 0 0 ...
 $ Mileage : freq_wts [1:804] 20105, 13457, 31655, 22479, 17590, 23635, 17381, 2755...</code></pre>
</div>
</div>
</section>
<section id="example-1-linear-regression-without-weights" class="level2">
<h2 class="anchored" data-anchor-id="example-1-linear-regression-without-weights">Example 1: linear regression without weights</h2>
<p>Now on with the first example. In the code below we define the <code>recipe</code>, define the model and set mode and engine. These are combined into a workflow. Afterwards we look at the properties of these objects to check if these are as expected. Note that <code>Saturn</code> is the reference dummy variable of my choice (i.e.&nbsp;in effect its coefficient is set to zero by default) and is thus excluded from the regression.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Get data ready for modeling with recipe package</span></span>
<span id="cb7-2">recipe1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb7-3">  db <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(Price <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Buick <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Cadillac <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Chevy <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Pontiac <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Saab) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># add all dummy variables but one</span></span>
<span id="cb7-5"></span>
<span id="cb7-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Define model, mode and engine with parsnip package</span></span>
<span id="cb7-7">model1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb7-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">linear_reg</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># adds the basic model type</span></span>
<span id="cb7-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'lm'</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># adds the computational engine to estimate the model parameters</span></span>
<span id="cb7-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_mode</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'regression'</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># adds the modeling context in which it will be used</span></span>
<span id="cb7-11"></span>
<span id="cb7-12"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Bundle pre-processing, modeling, and post-processing with workflow package</span></span>
<span id="cb7-13">workflow1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb7-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(recipe1) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(model1)</span>
<span id="cb7-17"></span>
<span id="cb7-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># View object properties</span></span>
<span id="cb7-19">recipe1</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Recipe

Inputs:

      role #variables
   outcome          1
 predictor          5</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">model1</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Linear Regression Model Specification (regression)

Computational engine: lm </code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">workflow1</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
0 Recipe Steps

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm </code></pre>
</div>
</div>
<p>Now that the objects look alright, the model estimation can be performed and the parameter estimates are printed.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Now estimate the model via a single call to fit()</span></span>
<span id="cb13-2">fit1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit</span>(workflow1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> db)</span>
<span id="cb13-3"></span>
<span id="cb13-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># View fit1 properties</span></span>
<span id="cb13-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>(fit1)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 5
  term        estimate std.error statistic   p.value
  &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;
1 (Intercept)   13979.      763.     18.3  7.62e- 63
2 Buick          6836.     1009.      6.77 2.46e- 11
3 Cadillac      26958.     1009.     26.7  9.16e-113
4 Chevy          2449.      832.      2.94 3.32e-  3
5 Pontiac        4433.      903.      4.91 1.10e-  6
6 Saab          15516.      943.     16.5  1.26e- 52</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glance</span>(fit1)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 12
  r.squared adj.r.s…¹ sigma stati…²   p.value    df logLik    AIC    BIC devia…³
      &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;   &lt;dbl&gt;
1     0.645     0.642 5911.    290. 1.53e-176     5 -8120. 16254. 16287. 2.79e10
# … with 2 more variables: df.residual &lt;int&gt;, nobs &lt;int&gt;, and abbreviated
#   variable names ¹​adj.r.squared, ²​statistic, ³​deviance</code></pre>
</div>
</div>
</section>
<section id="example-2-linear-regression-with-weights" class="level2">
<h2 class="anchored" data-anchor-id="example-2-linear-regression-with-weights">Example 2: linear regression with weights</h2>
<p>Then come the weights. The first thought is to update the current workflow with a line of code to make clear that weights should be used. However, this approach does not produce the desired result.</p>
<p>Therefore, an alternative approach is followed. Instead of building upon the blocks of the first example, we start with a new <code>workflow()</code> object and add an <code>add_case_weights</code> line of code to it. Next, one would expect a line of code with an <code>add_recipe</code> command, but for some reason this did not work after a “few” tries. Instead, we use <code>add_formula</code> with the regression formula as an argument. Lastly, surprisingly conventional, an <code>add_model</code> command is added.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">workflow2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span></span>
<span id="cb17-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb17-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_case_weights</span>(Mileage) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb17-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_formula</span>(Price <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Buick <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Cadillac <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Chevy <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Pontiac <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> Saab) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb17-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(model1)</span>
<span id="cb17-6"></span>
<span id="cb17-7">workflow2</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>══ Workflow ════════════════════════════════════════════════════════════════════
Preprocessor: Formula
Model: linear_reg()

── Preprocessor ────────────────────────────────────────────────────────────────
Price ~ 1 + Buick + Cadillac + Chevy + Pontiac + Saab

── Case Weights ────────────────────────────────────────────────────────────────
Mileage

── Model ───────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)

Computational engine: lm </code></pre>
</div>
</div>
<p>Now the parameters are estimated with one line of code as follows.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">fit2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit</span>(workflow2, db)</span>
<span id="cb19-2"></span>
<span id="cb19-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># View fit2 properties</span></span>
<span id="cb19-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>(fit2)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 5
  term        estimate std.error statistic   p.value
  &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;
1 (Intercept)   13448.      694.     19.4  8.70e- 69
2 Buick          7006.      918.      7.63 6.52e- 14
3 Cadillac      26152.      933.     28.0  7.96e-121
4 Chevy          2452.      759.      3.23 1.28e-  3
5 Pontiac        4647.      828.      5.61 2.73e-  8
6 Saab          15349.      853.     18.0  5.72e- 61</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glance</span>(fit2)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 12
  r.squared adj.r.…¹  sigma stati…²   p.value    df logLik    AIC    BIC devia…³
      &lt;dbl&gt;    &lt;dbl&gt;  &lt;dbl&gt;   &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;   &lt;dbl&gt;
1     0.664    0.661 7.67e5    315. 5.90e-186     5 -8111. 16237. 16269. 4.70e14
# … with 2 more variables: df.residual &lt;int&gt;, nobs &lt;int&gt;, and abbreviated
#   variable names ¹​adj.r.squared, ²​statistic, ³​deviance</code></pre>
</div>
</div>
<p>This is a nice first try! With the two examples above it is possible to experiment further in the hope of alternative/shorter routes to the estimation results. In the mean time, we wait for the tidymodels to include weights in the relevant packages. If you are inspired by these two examples (or not) and have some new ideas for progress, do not hesitate to <a href="https://www.tidyverse.org/blog/2022/05/case-weights/#getting-feedback">give feedback to the Tidyverse developers</a>.</p>
<p>Happy coding!</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{stam2022,
  author = {Stam, Piet},
  title = {Use Tidymodels with Weighted and Unweighted Data},
  date = {2022-09-19},
  url = {https://www.pietstam.nl/posts/2022-09-19-tidymodels-weighted-data/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-stam2022" class="csl-entry quarto-appendix-citeas">
Stam, Piet. 2022. <span>“Use Tidymodels with Weighted and Unweighted
Data.”</span> September 19, 2022. <a href="https://www.pietstam.nl/posts/2022-09-19-tidymodels-weighted-data/">https://www.pietstam.nl/posts/2022-09-19-tidymodels-weighted-data/</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <guid>https://www.pietstam.nl/posts/2022-09-19-tidymodels-weighted-data/</guid>
  <pubDate>Mon, 19 Sep 2022 00:00:00 GMT</pubDate>
  <media:content url="https://www.pietstam.nl/posts/2022-09-19-tidymodels-weighted-data/luis-reyes-mTorQ9gFfOg-unsplash.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Shortening my URL shortener</title>
  <dc:creator>Piet Stam</dc:creator>
  <link>https://www.pietstam.nl/posts/2022-09-04-url-shortener/</link>
  <description><![CDATA[ 






<section id="use-case" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="use-case">Use case</h2>
<p>In my presentation slides, I would like to refer people to the site where they can find the slides, but avoid those very long URLs that nobody can remember. So, instead of:</p>
<p><code>https://github.com/pjastam/talks/subdirectory/with/a/very/long/name/for/these/nice/slides</code></p>
<p>I want to have something like this:</p>
<p><code>https://pst.am/nice-presentation</code></p>
<p>Inspired by Andrew Heiss<sup>1</sup>, Kent C. Dodds<sup>2</sup> and Adrian Henry<sup>3</sup>, I managed to do so and will tell you how. In addition, since I was in the mood to make my workflow as short as possible, I went on to shorten the keystrokes in my local browser. There is a bonus section about this.</p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;The shortened personal URL at <a href="https://talks.andrewheiss.com/2022-03-22_uga-putting-everything-out-there/slides.html#1">this slide</a> triggered me</p></div><div id="fn2"><p><sup>2</sup>&nbsp;<a href="https://github.com/kentcdodds/netlify-shortener">GitHub repository</a> for netlify-shortener</p></div><div id="fn3"><p><sup>3</sup>&nbsp;<a href="https://hungryturtlecode.com/tutorials/shortlinks-netlify/">Blog</a> on building your own link shortener with Netlify</p></div></div></section>
<section id="prerequisites" class="level2">
<h2 class="anchored" data-anchor-id="prerequisites">Prerequisites</h2>
<ul>
<li><p>A <a href="https://github.com/"> GitHub</a> account (to set up a <code>_redirects</code> file)</p></li>
<li><p>A <a href="https://netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a> account (to execute the <code>_redirects</code> file)</p></li>
<li><p>A Brave/Chrome browser</p></li>
</ul>
</section>
<section id="step-by-step-guide-url-shortener" class="level2">
<h2 class="anchored" data-anchor-id="step-by-step-guide-url-shortener">Step-by-step guide: URL shortener</h2>
<ul>
<li><p>Purchase a domain to use as your short link domain</p></li>
<li><p>Make up a very, very domain name (in my case, I imagined <code>pst.am</code> would be nice)</p></li>
<li><p>Set up a new site at <a href="https://netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a></p></li>
<li><p>Buy your domain name from <a href="https://netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a> during this set up</p></li>
<li><p>Alternatively, buy your domain name from another registrar, which is what I did</p>
<ul>
<li><p>I went to <a href="https://iwantmyname.com/">iwantmyname.com</a> and registered the domain name</p></li>
<li><p>At <a href="https://netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a>, after the new site has been set up, find out how the name servers are called <img src="https://www.pietstam.nl/posts/2022-09-04-url-shortener/netlify-nameservers.png" class="img-fluid"></p></li>
<li><p>At <a href="https://iwantmyname.com/">iwantmyname.com</a>, delegate the DNS of yor domain name to <a href="https://netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a> <img src="https://www.pietstam.nl/posts/2022-09-04-url-shortener/iwantmyname-nameservers.png" class="img-fluid"></p></li>
</ul></li>
<li><p>Create a private git repository called <code>url-shortener</code> at <a href="https://github.com/pjastam/url-shortener"> GitHub</a> <img src="https://www.pietstam.nl/posts/2022-09-04-url-shortener/github-repo.png" class="img-fluid"></p></li>
<li><p>Create a <code>_redirects</code> file and add some initial configuration lines to map short links to full URLs <img src="https://www.pietstam.nl/posts/2022-09-04-url-shortener/github-redirects.png" class="img-fluid"></p></li>
<li><p>Login to <a href="https://netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a> and connect the <a href="https://github.com/"> GitHub</a> repo</p></li>
<li><p>Enable automatic TLS certificates with Let’s Encrypt for your short domain name at <a href="https://netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a></p></li>
<li><p>Wait until the domain name is served by <a href="https://netlify.com/"><iconify-icon inline="" icon="bxl:netlify"></iconify-icon> Netlify</a>’s DNS</p>
<ul>
<li>This may take up to 24 hours, depending on your registrar processing the change in nameservers</li>
</ul></li>
<li><p>The first line of code in the <code>_redirects</code> file is a helper function for easily editing the <code>_redirects</code> file in your <a href="https://github.com/"> GitHub</a> repository online</p>
<ul>
<li>Open your browser</li>
<li>Type <code>pst.am/edit</code></li>
<li>Commit your changes</li>
</ul></li>
<li><p>The second line of code in the <code>_redirects</code> file links to a default page, in this case the homepage of my website, if you type the short domain name without extension</p>
<ul>
<li>Open your browser</li>
<li>Type <code>pst.am</code></li>
<li>Hit the enter key</li>
</ul></li>
<li><p>To add new redirects, start editing the <code>_redirects</code> file and add a line of code starting with a <code>\</code> followed by <code>a short word or abbreviation</code>, then at least one space, and finally the <code>URL</code> to which the short word or abbreviation should redirect.</p></li>
</ul>
</section>
<section id="step-by-step-guide-keystrokes-shortener" class="level2">
<h2 class="anchored" data-anchor-id="step-by-step-guide-keystrokes-shortener">Step-by-step guide: Keystrokes-shortener</h2>
<ul>
<li><p>Open your Brave/Chrome browser</p></li>
<li><p>Navigate to: Settings -&gt; Search Engine -&gt; Manage Search Engines and Safe Search -&gt; Site Search -&gt; Click the <code>Add</code> button</p></li>
<li><p>Fill out <code>pst.am</code> as the search engine, <code>:x</code> as the shortcut and the URL <code>https://pst.am/%s</code> <img src="https://www.pietstam.nl/posts/2022-09-04-url-shortener/brave-add.png" class="img-fluid"></p></li>
<li><p>Click the <code>Add</code> button <img src="https://www.pietstam.nl/posts/2022-09-04-url-shortener/brave-save.png" class="img-fluid"></p></li>
<li><p>Note that in this guide I use <code>:x</code> as a shortcut, you can use this too or replace it by another shortcut if you like</p></li>
<li><p>You can now edit the <code>_redirects</code> file in your <a href="https://github.com/"> GitHub</a> repository online as follows</p>
<ul>
<li>Open your browser</li>
<li>Type <code>:x</code></li>
<li>Press space bar or tab key</li>
<li>Type <code>edit</code></li>
<li>Hit return key</li>
</ul></li>
<li><p>Or, if you want to go to my homepage</p>
<ul>
<li>Open your browser</li>
<li>Type <code>:x</code></li>
<li>Hit return key</li>
</ul></li>
</ul>
</section>
<section id="a-next-step" class="level2">
<h2 class="anchored" data-anchor-id="a-next-step">A next step?</h2>
<p>I occasionally use my workflow to shorten a URL for a new presentation. Setting up new presentation slides does not happen every day in my case, so editing the _redirects file online at <a href="https://github.com/"> GitHub</a> every now and then is no bother.</p>
<p>But you may like to automate this part of the workflow if you need to shorten URLs on a more frequent basis, for example in case of creating your daily posts. Kent C. Dodds shows you how in <a href="https://www.youtube.com/watch?v=HL6paXyx6hM&amp;list=PLV5CVI1eNcJgCrPH_e6d57KRUTiDZgs0u">his video</a> at  YouTube. The downside of this is that those links will be rather cryptic. Like the ones you get if you use the bit.ly shortened URLs, for example. This makes the shortened URL difficult to remember.</p>


</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{stam2022,
  author = {Stam, Piet},
  title = {Shortening My {URL} Shortener},
  date = {2022-09-04},
  url = {https://www.pietstam.nl/posts/2022-09-04-url-shortener/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-stam2022" class="csl-entry quarto-appendix-citeas">
Stam, Piet. 2022. <span>“Shortening My URL Shortener.”</span> September
4, 2022. <a href="https://www.pietstam.nl/posts/2022-09-04-url-shortener/">https://www.pietstam.nl/posts/2022-09-04-url-shortener/</a>.
</div></div></section></div> ]]></description>
  <category>digital transformation</category>
  <guid>https://www.pietstam.nl/posts/2022-09-04-url-shortener/</guid>
  <pubDate>Sun, 04 Sep 2022 00:00:00 GMT</pubDate>
  <media:content url="https://www.pietstam.nl/posts/2022-09-04-url-shortener/feature_bw.png" medium="image" type="image/png" height="98" width="144"/>
</item>
<item>
  <title>Coronavirus apps: better safe than sorry (in Dutch)</title>
  <dc:creator>Piet Stam</dc:creator>
  <link>https://www.pietstam.nl/posts/2020-04-12-coronavirus-apps-privacy-conditions/</link>
  <description><![CDATA[ 






<p>Het kabinet kondigt twee apps aan voor het monitoren en beheersen van de verspreiding van het coronavirus. Eén app vertelt je of je in de buurt bent geweest van een andere gebruiker, die besmet blijkt te zijn. Je krijgt dan het advies om binnen te blijven. De tweede app die het kabinet aankondigt is de al bestaande app van het OLVG-ziekenhuis in Amsterdam. Door je temperatuur en klachten over kortademigheid, keelpijn en verkoudheid in de app in te vullen, kan een medisch team van een ziekenhuis aan de hand van RIVM-richtlijnen uw risico op een besmetting en zorgbehoefte van een afstand inschatten.</p>
<p>Het kabinet onderzoekt of ze het gebruik van deze apps verplicht kan stellen, maar benadrukt dat ze liever deelname op vrijwillige basis ziet. Maar een opt-in mogelijkheid is geen keuze die in deze angstige tijd te lichtzinnig gemaakt moet worden. Het biedt immers geen garantie dat onze data in goede handen komt en dat onze privacy wordt gewaarborgd.</p>
<p>Het OLVG wil de data bijvoorbeeld in geanonimiseerde vorm voor wetenschappelijk onderzoek gebruiken, wat in de algemene voorwaarden als secundair doel wordt vermeld. Maar anonimisering hoeft geen garantie te bieden dat onze privacy wordt gewaarborgd, want door koppeling van locatiegegevens en socialemediaberichten is het mogelijk om de identiteit van personen alsnog te achterhalen.</p>
<p>Maar als opt-in en anonimisering onvoldoende zijn om onze privacy te garanderen, is het dan überhaupt mogelijk om deze apps uit te rollen zonder dat onze privacy op de tocht komt te staan? Die mogelijkheid is er zeker en wij doen hieronder een aftrap voor een lijst met de privacyvoorwaarden waaraan deze apps dan moeten voldoen.</p>
<section id="privacyvoorwaarden" class="level1">
<h1>Privacyvoorwaarden</h1>
<p>Een vruchtbare weg om bij de realisatie van de apps onze privacy te borgen is als we komen tot een lijst van voorwaarden c.q. een beschrijving van een situatie waarin het delen van persoonlijke data vanuit Europese normen en waarden aanvaardbaar is. Apps kunnen dan aan de hand van deze lijst volgens het principe ‘privacy by design’ ontworpen en gebouwd worden. Hieronder volgt onze aftrap voor zo’n lijst met noodzakelijke voorwaarden (als je op het driehoekje klikt dan volgt een nadere toelichting):</p>
<details>
<summary>
<b>Hoger doel</b>: De apps moeten een hoger algemeen sociaal doel dienen. Dat doel moet helder verwoord en afgebakend zijn.
</summary>
<p>Het is evident dat er in dit geval sprake is van zo’n hoger doel, namelijk het tegengaan van de verspreiding van het coronavirus in ons land. Maar het is niet zonder meer zo dat het in algemene zin uitvoeren van wetenschappelijk onderzoek op deze data, het secundaire doel, dat ook is. De nog te formuleren specifieke wetenschappelijke onderzoeksvragen zijn bij de keuze om de apps te gebruiken immers niet bekend bij de gebruiker, dus die weet niet waarvoor hij nu toestemming geeft. Het is in ieder geval niet het primaire doel van het kabinet om deze data voor wetenschappelijk onderzoek te verzamelen.</p>
</details>
<details>
<summary>
<b>Afgebakende periode</b>: Als het hogere doel het beteugelen van een crisis is, dan moet er een afgebakende periode van datadeling zijn. Deze periode kan eventueel verlengd worden omdat de crisis nog niet bezworen is. Deze besluitvoming moet voldoen aan democratische uitgangspunten.
</summary>
<p>Het risico van in crisistijd ingevoerde maatregelen is dat de tijdelijkheid ervan in de geschiedenis een rekbaar begrip is gebleken, daarom zijn harde afspraken hierover noodzakelijk. Een afgebakende periode kan zijn om de data te delen zolang de crisis nog niet bezworen is. Om te kunnen vaststellen of de crisis is bezworen zullen een aantal objectieve beoordelingscriteria moeten worden geformuleerd. Een voorbeeld daarvan is de verspreidingscoefficient, i.e.&nbsp;het aantal mensen dat wordt besmet als ze met één besmet persoon in contact komen. Het streven van het RIVM is om die verspreiding coëfficiënt (langdurig) onder de 1 te houden, het is goed om daarbij een vaste ondergrens gedurende een vaste periode af te spreken.</p>
</details>
<details>
<summary>
<b>Data verwijderen</b>: Als het hogere doel gerealiseerd is, dan moet de gedeelde data verwijderd worden.
</summary>
<p>Hier luistert het nauw of naast het primaire doel van de apps ook secundaire doelen als hoger doel worden betiteld. Als alleen het primaire doel van de verspreiding van het coronavirus wordt nagestreefd, dan dienen alle data te worden verwijderd nadat dat doel is bereikt.</p>
<p>Als het secundaire doel van het uitvoeren van wetenschappelijk onderzoek ook als hoger doel wordt geaccepteerd, dan is het sterk de vraag hoe lang deze data nog bewaard gaan worden. Een gemiddeld promotietraject van een promovendus aan de universiteit duurt langer dan 4 jaar, wat al substantieel langer is dan de maximale bewaartermijn van 2 jaar die onderdeel is van de algemene voorwaarden van de app van het OLVG.</p>
</details>
<details>
<summary>
<b>Gegevens onder beheer van de gebruikers zelf</b>: Data moet zo veel mogelijk bij de bron / eigenaar blijven en zo min mogelijk als kopieën in datawarehouses / hubs / markets / lakes worden opgeslagen.
</summary>
<p>Idealiter blijft de data die in de apps worden ingevuld op de mobiele telefoon van degene staan die die data invult en worden de bewerkingen op de data decentraal uitgevoerd. Alleen de geaggregeerde resultaten van de lokaal bewerkte data komen dan terecht bij de instanties die deze stuurinformatie nodig hebben, denk aan het ziekenhuis, het RIVM of de GGD. Dit is de werkwijze van de Personal Health Train, het concept dat Minister Bruins <a href="https://www.rijksoverheid.nl/documenten/kamerstukken/2018/11/15/kamerbrief-over-data-laten-werken-voor-gezondheid">propageert</a> en is ontwikkeld door <a href="https://www.dtls.nl/fair-data/personal-health-train/">DTL, MAASTRO en LUMC</a>.</p>
<p>Deze werkwijze is des te meer van belang om te voorkomen dat de data ten behoeve van het secundaire doel op vele plekken terecht komt. Daarbij is er de voorwaarde dat de data die voor primaire en secundaire doelen centraal getrokken wordt, met de vereiste beveiligingsmaatregelen wordt beheerd.</p>
</details>
<details>
<summary>
<b>Bescherming van belangen</b>: Er moet adequate wet- &amp; regelgeving om het gebruik van de apps gebouwd zijn, incl.&nbsp;toezichthoudende en rechtsprekende organen, zodat bij het schenden van belangen onafhankelijk recht gesproken kan worden.
</summary>
<p>Deze randvoorwaarde ligt voor de hand. Nagegaan moet worden of de algemene verordening gegevensbescherming (AVG) volstaat of dat aanvullende of bijgestelde wetgeving noodzakelijk is. Het is daarbij onder meer van belang dat individuele burgers het recht hebben om misbruik van hun data te melden en het stop zetten daarvan te kunnen afdwingen.</p>
</details>
<details>
<summary>
<b>Governance structuur</b>: De besluitvorming over de ontwikkeling en het gebruik van de apps en de verzamelde data, moet zodanig zijn ingericht dat belangen evenwichtig worden gediend.
</summary>
<p>Bits of Freedom <a href="https://www.bitsoffreedom.nl/2020/03/20/privacy-is-geen-absoluut-recht-maar-wel-een-noodzaak/">noemt</a> het kabinet als de ‘governance body’. Dat is ons inziens een onvoldoende invulling van deze randvoorwaarde. Dat geeft te weinig invloed, voor direct betrokkenen en maakt geen gebruik van de noodzakelijke expertise voor een goede besluitvorming (de meeste volksvertegenwoordigers hebben weinig expertise van big data, AI en van technieken waarmee op grote schaal data wordt verzameld). Van belang is dat de governance structuur een evenwichtige vertegenwoordiging van belangenpartijen kent: overheid, bedrijfsleven en burgers.</p>
</details>
<details>
<summary>
<b>Gezonde prikkels</b>: Het systeem moet het gewenste datadelinggedrag stimuleren (en ongewenst gedrag tegen gaan).
</summary>
<p>Over het ongewenste gedrag van datadelen op de bekende sociale platforms is de afgelopen jaren al veel gepubliceerd. Deze platforms zetten zeer succesvol aan tot het delen van data, maar er zijn inmiddels diverse kanttekeningen gezet bij het gebruik van de (persoons)data voor doeleinden die inbreuk maken op onze privacy. In feite komt deze randvoorwaarde er op neer dat burgers niet verleid worden tot meer datadeling dan nodig om het hogere doel te dienen en niet verdiend wordt aan datadeling.</p>
</details>
<details>
<summary>
<b>Data aan de bron anoniem houden</b>: Data moet geanonimiseerd worden, maar dat is zoals eerder gezegd op zich nog geen voldoende voorwaarde.
</summary>
<p>Uit <a href="http://privacytools.seas.harvard.edu/files/privacytools/files/paper1.pdf">Amerikaans onderzoek</a> op basis van de volkstelling in 1990 is gebleken dat veel individuen binnen geografisch afgebakende populaties combinaties van demografische kenmerken hebben die niet vaak voorkomen. Een verrassend resultaat was dat slechts drie kenmerken (postcode, geslacht en geboortedatum) 87% van alle Amerikanen (bijna) uniek maakt. Met drie andere kenmerken (woonplaats, geslacht en geboortedatum) geldt dat voor 53% van de totale Amerikaanse bevolking. Merk hierbij op dat in datasets in het algemeen meer dan drie gegevens per persoon worden vastgelegd.</p>
<p>Geanonimiseerde data, zeker data die ook gezondheidskenmerken bevatten, kunnen dus niet zonder meer als anoniem worden gezien. Deze constatering is relevant met betrekking tot het primaire doel van de dataverzameling, maar zeker ook met betrekking tot het secundaire doel. Als er voor dat laatste doel al data gebruikt gaan worden, dan is het verstandig om daarvoor alleen een minimale dataset beschikbaar te stellen waarmee de (nog nader te formuleren specifieke) wetenschappelijke onderzoeksvraag voldoende beantwoord kan worden.</p>
</details>
<details>
<summary>
<b>Transparantie</b>: Bedrijven zullen de oplossing waarschijnlijk gaan realiseren omdat het de overheid aan technische expertise ontbreekt. Dat vereist transparantie over het gebruikte bedrijfsmiddel.
</summary>
<p>Het OLVG heeft aangekondigd dat zij de verzamelde data openbaar wil aanbieden aan wetenschappelijke onderzoekers. Gezien de overige randvoorwaarden is het de vraag of dit acceptabel is voor degenen wiens data het betreft. In ieder geval moet de verzamelde data door auditors in te zien zijn.</p>
<p>Voor de invulling van de randvoorwaarde van transparantie is het essentieel dat de broncode van de app openbaar wordt gemaakt. Hierdoor kan iedere burger zelf controleren waar de in de app ingevoerde data naartoe stroomt en welke bewerkingen daarop worden uitgevoerd. Hierbij dient te worden aangetekend dat code slecht door een beperkt gedeelte van de bevolking gelezen kan worden. Experts kunnen hun bevindingen wel in lekentaal formuleren en delen. Een andere belangrijke maatregel is dat de resultaten van de bewerkingen door derden moeten kunnen worden gereproduceerd.</p>
</details>
<details>
<summary>
<b>Validiteit en betrouwbaarheid</b>: Validiteit en betrouwbaarheid zijn statistische criteria die vereisen dat met de apps daadwerkelijk wordt gemeten wat men beoogt te meten. En dat toeval daarbij een zo klein mogelijke rol speelt.
</summary>
<p>Het eerder geformuleerde criterium van het hogere doel vereist niet alleen dat er sprake is van het bestaan van een hoger doel, maar ook dat daarmee het hogere doel wordt gediend. Dat vereist helderheid over hoe de apps, de data en de vervolgmaatregelen dit doel gaan bereiken en hoe bijgestuurd wordt richting dat doel. Laat dus zien hoe succesvol men is in het bereiken van dat doel. Het risico is aanwezig dat de twee apps op dit moment niet aan dit criterium voldoen, aangezien in de antwoorden van het RIVM op veelgestelde vragen staat dat er nog steeds geen zekerheid is over het ontstaan van immuniteit en de duur daarvan.</p>
</details>
</section>
<section id="oproep-aan-het-kabinet" class="level1">
<h1>Oproep aan het kabinet</h1>
<p>Wij hopen dat het kabinet alleen besluit tot inzet van apps met borging van onze privacy. Zeker als die apps verplicht worden gesteld, maar evenzeer als deze op vrijwillige basis te gebruiken zijn. Tot op heden is er geen sluitend bewijs dat het inzetten van apps het hogere doel dient. ‘Baat het niet, schaadt het niet’ gaat hier niet op!</p>
<p>Onze zorgen over de verspreiding van het coronavirus zijn groot, maar onze privacy is ook een groot goed en angst is een slechte raadgever. In alle opzichten geldt: ‘voorkomen is beter dan genezen’.</p>
</section>
<section id="commmunity" class="level1">
<h1>Commmunity</h1>
<p>Wij, 3 bezorgde burgers, hebben dit voorstel op persoonlijke titel geschreven. Onze namen zijn:</p>
<ul>
<li>Piet Stam, gezondheidseconoom</li>
<li>Peter Nobels, zorg-IT management consultant</li>
<li>Marco Woesthuis, arts (niet praktizerend)</li>
</ul>
<p>Wij pretenderen geenszins de wijsheid in pacht te hebben, onze privacy gaat ons immers allemaal aan. We hopen daarom op inbreng van andere bezorgde burgers, zodat de lijst wordt aangescherpt en we samen de overheid aan een maatschappelijk gedragen toetsingskader helpen. Daarom hebben we de lijst met 10 uitgangspunten in een <a href="https://github.com/pjastam/coronavirus-privacy/wiki">Wiki</a> gezet, zodat iedereen kan meehelpen aan de aanscherping. Het enige dat je daarvoor nodig hebt is een GitHub login.</p>
</section>
<section id="originele-versie" class="level1">
<h1>Originele versie</h1>
<p>De originele versie van ons voorstel zoals die hierboven staat, is terug te vinden in de openbare <a href="https://github.com/pjastam/coronavirus-privacy">repository</a> op <a href="https://github.com">GitHub</a>. Op <a href="https://nl.wikipedia.org/wiki/GitHub">Wikipedia</a> vind je de beschrijving van wat GitHub is. Zie de README in de GitHub <a href="https://github.com/pjastam/coronavirus-privacy">repository</a> voor verdere details en contactinfo.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{stam2020,
  author = {Stam, Piet},
  title = {Coronavirus Apps: Better Safe Than Sorry (in {Dutch)}},
  date = {2020-04-12},
  url = {https://www.pietstam.nl/posts/2020-04-12-coronavirus-apps-privacy-conditions/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-stam2020" class="csl-entry quarto-appendix-citeas">
Stam, Piet. 2020. <span>“Coronavirus Apps: Better Safe Than Sorry (in
Dutch).”</span> April 12, 2020. <a href="https://www.pietstam.nl/posts/2020-04-12-coronavirus-apps-privacy-conditions/">https://www.pietstam.nl/posts/2020-04-12-coronavirus-apps-privacy-conditions/</a>.
</div></div></section></div> ]]></description>
  <category>digital transformation</category>
  <guid>https://www.pietstam.nl/posts/2020-04-12-coronavirus-apps-privacy-conditions/</guid>
  <pubDate>Sun, 12 Apr 2020 00:00:00 GMT</pubDate>
  <media:content url="https://www.pietstam.nl/posts/2020-04-12-coronavirus-apps-privacy-conditions/feature_bw.png" medium="image" type="image/png" height="142" width="144"/>
</item>
<item>
  <title>The ball is round (in Dutch)</title>
  <dc:creator>Piet Stam</dc:creator>
  <link>https://www.pietstam.nl/posts/2019-05-10-de-bal-is-rond/</link>
  <description><![CDATA[ 






<blockquote class="blockquote">
<p>Deze blog heb ik op <a href="https://www.linkedin.com/pulse/de-bal-rond-piet-stam/">LinkedIn</a> gepubliceerd en is hier integraal overgenomen om de openbare beschikbaarheid te garanderen.</p>
</blockquote>
<p>Ajax en PSV vechten in hun race naar het kampioenschap tot op het laatst voor de punten. Dit weekend speelt Ajax thuis tegen FC Utrecht en PSV uit tegen AZ. Een hiërarchische Bayesiaanse Poisson analyse (ahum) van 5 seizoenen aan wedstrijddata voorspelt winst voor Ajax en PSV, maar voor PSV is er ook nog een gerede kans op een gelijkspel (1-1). Het kampioenschap gloort voor Ajax. Niettemin, de bal is rond… #livbar #ajatot</p>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-de-bal-is-rond/2019-05-10-2.png" class="img-fluid"></p>
<p>Het voordeel van een Bayesiaanse aanpak is dat we naast puntvoorspellingen ook de onzekerheid die daarmee gepaard gaat goed in beeld kunnen brengen. Zo staan in bovenstaande figuur de verwachte kansverdelingen van de doelpunten die met het model worden voorspeld.</p>
<p>In de eerste drie grafieken vind je de voorspelde verdelingen van het aantal AZ doelpunten (‘Histogram of home_goals’), het aantal PSV doelpunten (‘Histogram of away_goals’) en het doelsaldo (‘Histogram of goal_diff’). Uit de eerste twee grafieken volgt dat AZ en PSV beiden het meeste kans hebben om één doelpunt te scoren (de piek van die grafieken - de modus - ligt immers bij 1 doelpunt). Uit de derde grafiek volgt dat de kans het grootst is dat het doelsaldo op 0 of -1 uitkomt: of AZ en PSV spelen gelijk (0), of PSV scoort net een doelpuntje meer (-1). In dat laatste geval is PSV toch de matchwinnaar.</p>
<p>Dat een winst van PSV zeker niet moet worden uitgesloten, blijkt ook uit de vierde grafiek met een voorspelling van het wedstrijdresultaat. Daarin wordt de kansverdeling getoond met betrekking tot winst voor PSV (‘away_win’), een gelijkspel (‘equal’) en winst van AZ (‘home_win’). De kans dat PSV de wedstrijd winnend afsluit is meer dan 50%. Dat de weegschaal in het voordeel van PSV doorslaat, zou dan moeten komen door dat ene doelpuntje verschil die in de derde grafiek een grote kans wordt toegedicht.</p>
<p>Als je geïnteresseerd bent in de volledige lijst met modelvoorspellingen, bijv. dat van alle gelijkspelen juist 1-1 een grote kans maakt, dan vind je de doorverwijzing daarnaar op <a href="https://pietstam.nl/blog/2019/05/10/bayesian-football-odds">mijn blog</a>. Ook vind je daar de route naar de broncode en gebruikte data, die je kunt gebruiken om deze analyse zelf eens uit te voeren in R.</p>
<section id="post-scriptum" class="level2">
<h2 class="anchored" data-anchor-id="post-scriptum">Post Scriptum</h2>
<blockquote class="blockquote">
<p>Op <a href="https://www.linkedin.com/pulse/de-bal-rond-piet-stam/">LinkedIn</a> heb ik <strong>de dag voor de wedstrijd op 12 mei 2019</strong> het volgende commentaar toegevoegd</p>
</blockquote>
<p>Zoals de Engelsen zeggen: “put your money where your mouth is”. En dus heb ik op winst voor AZ en de gelijkspelletjes 0-0 en 1-1 elk een eurootje ingezet. Volgens mijn Bayesiaanse model zijn de kansen op de Toto goksite van de Nederlandse Loterij te laag ingeschat, dus hier valt mogelijk wat te halen. Fingers crossed!</p>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-de-bal-is-rond/2019-05-10-3.png" title="Bron: Nederlandse Loterij" class="img-fluid"></p>
<blockquote class="blockquote">
<p>Op <a href="https://www.linkedin.com/pulse/de-bal-rond-piet-stam/">LinkedIn</a> heb ik <strong>na afloop van de wedstrijd op 12 mei 2019</strong> het volgende commentaar toegevoegd</p>
</blockquote>
<p>Gewonnen! Nee, ik bedoel niet het kampioenschap van #Ajax, maar de Toto! Door de winst van #AZ met 3 euro inleg 4,15 euro verdiend, dus een winstmarge van 1,15 / 3 = 38%. Waar krijg je dat rendement tegenwoordig nog? In ieder geval niet op mijn spaarrekening. En dat allemaal mbv mijn #Bayesiaanse voorspelmodelletje. Dat belooft nog wat voor volgend seizoen. Ook maar eens nadenken hoe we dit #algoritme voor serieuzere zaken zinvol kunnen inzetten. #statistics #bestguess</p>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-de-bal-is-rond/2019-05-10-4.png" title="Bron: Nederlandse Loterij" class="img-fluid"></p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{stam2019,
  author = {Stam, Piet},
  title = {The Ball Is Round (in {Dutch)}},
  date = {2019-05-10},
  url = {https://www.pietstam.nl/posts/2019-05-10-de-bal-is-rond/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-stam2019" class="csl-entry quarto-appendix-citeas">
Stam, Piet. 2019. <span>“The Ball Is Round (in Dutch).”</span> May 10,
2019. <a href="https://www.pietstam.nl/posts/2019-05-10-de-bal-is-rond/">https://www.pietstam.nl/posts/2019-05-10-de-bal-is-rond/</a>.
</div></div></section></div> ]]></description>
  <category>sports analytics</category>
  <guid>https://www.pietstam.nl/posts/2019-05-10-de-bal-is-rond/</guid>
  <pubDate>Fri, 10 May 2019 21:00:00 GMT</pubDate>
  <media:content url="https://www.pietstam.nl/posts/2019-05-10-de-bal-is-rond/feature_bw.png" medium="image" type="image/png" height="81" width="144"/>
</item>
<item>
  <title>Modeling match results in the Dutch Eredivisie using a hierarchical Bayesian Poisson model</title>
  <dc:creator>Piet Stam</dc:creator>
  <link>https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/</link>
  <description><![CDATA[ 






<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<section id="quick-summary" class="level3">
<h3 class="anchored" data-anchor-id="quick-summary">Quick summary</h3>
<p>We applied the original work of <a href="http://www.sumsar.net/about.html">Rasmus Baath</a> to the Dutch Eredivisie football competition. With <code>r-bayesian-football-odds</code> we estimated the odds of football matches in the last two weeks of the 2018/2019 Dutch Eredivisie football competiton. We provide the code and evaluate the results of our predictions.</p>
</section>
<section id="acknowledgements" class="level3">
<h3 class="anchored" data-anchor-id="acknowledgements">Acknowledgements</h3>
<p>This piece of work is based on the works of <a href="http://www.sumsar.net/blog/2013/07/modeling-match-results-in-la-liga-part-one/">Rasmus Baath</a>. Rasmus Baath submitted his code to the <a href="https://www.r-project.org/conferences/useR-2013/">UseR 2013 data analysis contest</a> and licensed it under the Creative Commons <a href="http://creativecommons.org/licenses/by/3.0/">Attribution 3.0 Unported license</a>.</p>
<p>He predicted the results of the 50 last matches of the 2012/2013 Spanish LaLiga season. He used data of the 2008/09-2012/13 seasons (5 seasons in total) to estimate his regression model in a <a href="https://en.wikipedia.org/wiki/Bayes_estimator">Bayesian</a> way. See <a href="https://stats.stackexchange.com/questions/252577/bayes-regression-how-is-it-done-in-comparison-to-standard-regression">this thread</a> for an intuitive explanation of the difference between the bayesian approach and the conventional approaches of linear regression and maximum likelihood.</p>
<p>I slightly adpated his code to predict the results of the last two competition rounds (that is, the last 18 matches) of the 2018/2019 Dutch Eredivisie season. These predictions are based on soccer match data of the 2014/15-2018/19 seasons (5 seasons in total). The source of these data is <a href="http://www.football-data.co.uk/netherlandsm.php">here</a>. Out of the three model specifications that Rasmus developed, I used the most sophisticated model that allowed for year-to-year variability in team skill (called “iteration 3” by Rasmus).</p>
<p>You can find my code at <a href="https://github.com/pjastam/r-bayesian-football-odds">GitHub</a>. Rasmus deserves all the credits, I deserve all the blame in case of any errors in my application to the Dutch football competition.</p>
</section>
</section>
<section id="data-and-methods" class="level2">
<h2 class="anchored" data-anchor-id="data-and-methods">Data and methods</h2>
<section id="theoretical-description-of-the-model" class="level3">
<h3 class="anchored" data-anchor-id="theoretical-description-of-the-model">Theoretical description of the model</h3>
<section id="basic-model" class="level4">
<h4 class="anchored" data-anchor-id="basic-model">Basic model</h4>
<p>The first thing to notice is that not all teams are equally good. Therefore, it will be assumed that all teams have a latent skill variable and the skill of a team <em>minus</em> the skill of the opposing team defines the predicted outcome of a game. As the number of goals are assumed to be Poisson distributed it is natural that the skills of the teams are on the log scale of the mean of the distribution.</p>
<p>In its simplest form, the distribution of the number of goals for team <img src="https://latex.codecogs.com/png.latex?i"> when facing team <img src="https://latex.codecogs.com/png.latex?j"> is then</p>
<p><img src="https://latex.codecogs.com/png.latex?Goals_%7Bi,j%7D%20%5Csim%20%5Ctext%7BPoisson%7D(%5Clambda_%7Bi,j%7D)"></p>
<p><img src="https://latex.codecogs.com/png.latex?%5Clog(%5Clambda_%7Bi,j%7D)%20=%20%5Ctext%7Bbaseline%7D%20+%20%5Ctext%7Bskill%7D_i%20-%0A%5Ctext%7Bskill%7D_j"></p>
<p>where baseline is the log average number of goals when both teams are equally good. Note that this model description does not capture the variation in the number of goals among football seasons and between home vs away teams.</p>
</section>
<section id="general-model" class="level4">
<h4 class="anchored" data-anchor-id="general-model">General model</h4>
<p>In order to allow for variation in the number of goals among football seasons and between home vs away teams, we refine the distribution of the goal outcome of a match between home team <img src="https://latex.codecogs.com/png.latex?i"> and away team <img src="https://latex.codecogs.com/png.latex?j"> in season <img src="https://latex.codecogs.com/png.latex?s"> as follows:</p>
<p><img src="https://latex.codecogs.com/png.latex?%7Bgoals%7D%5E%5Ctext%7Bhome%7D_%7Bs,i,j%7D%20%5Csim%20%5Ctext%7BPoisson%7D(%5Clambda%5E%5Ctext%7Bhome%7D_%7Bs,i,j%7D)"></p>
<p>with the <code>lambdas</code> defined as follows</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Clambda%5E%5Ctext%7Bhome%7D_%7Bs,i,j%7D%20=%20%5Cexp(%5Ctext%7Bbaseline%7D%5E%5Ctext%7Bhome%7D_s%20+%20%5Ctext%7Bskill%7D_%7Bs,i%7D%20-%20%5Ctext%7Bskill%7D_%7Bs,j%7D)"></p>
<p><img src="https://latex.codecogs.com/png.latex?%5Clambda%5E%5Ctext%7Baway%7D_%7Bs,i,j%7D%20=%20%5Cexp(%5Ctext%7Bbaseline%7D%5E%5Ctext%7Baway%7D_s%20+%20%5Ctext%7Bskill%7D_%7Bs,j%7D%20-%20%5Ctext%7Bskill%7D_%7Bs,i%7D)"></p>
<p>Note that the <code>baseline</code> is split into <code>home_baseline</code> and <code>away_baseline</code> in order to account for the home advantage. Furthermore, we introduced the index t for the baseline and skill parameters to allow for variation among seasons.</p>
<section id="defining-the-baseline-distributions" class="level5">
<h5 class="anchored" data-anchor-id="defining-the-baseline-distributions">Defining the baseline distributions</h5>
<p>I set the prior distributions of the baselines in season <img src="https://latex.codecogs.com/png.latex?s"> to:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7Bbaseline%7D%5E%5Ctext%7Bhome%7D_s%20%5Csim%20%5Ctext%7BNormal%7D(%5Ctext%7Bbaseline%7D%5E%5Ctext%7Bhome%7D_%7Bs-1%7D,%20%5Csigma_%7B%5Ctext%7Bseasons%7D%7D%5E2)"></p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7Bbaseline%7D%5E%5Ctext%7Baway%7D_s%20%5Csim%20%5Ctext%7BNormal%7D(%5Ctext%7Bbaseline%7D%5E%5Ctext%7Baway%7D_%7Bs-1%7D,%20%5Csigma_%7B%5Ctext%7Bseasons%7D%7D%5E2)"></p>
<p>and in the <em>first</em> season to:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7Bbaseline%7D%5E%5Ctext%7Bhome%7D_1%20%5Csim%20%5Ctext%7BNormal%7D(0,%204%5E2)"></p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7Bbaseline%7D%5E%5Ctext%7Baway%7D_1%20%5Csim%20%5Ctext%7BNormal%7D(0,%204%5E2)"></p>
<p>with <code>sigma-seasons</code> defined as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Csigma_%5Ctext%7Bseasons%7D%20%5Csim%20%5Ctext%7BUniform%7D(0,%203)"></p>
</section>
<section id="defining-the-team-skill-distributions" class="level5">
<h5 class="anchored" data-anchor-id="defining-the-team-skill-distributions">Defining the team skill distributions</h5>
<p>I set the prior distributions over the skills of team <img src="https://latex.codecogs.com/png.latex?i"> (or <img src="https://latex.codecogs.com/png.latex?j">, denoted as <img src="https://latex.codecogs.com/png.latex?i%7Cj">) in season <img src="https://latex.codecogs.com/png.latex?s"> to:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7Bskill%7D_%7Bs,i%7Cj%7D%20%5Csim%20%5Ctext%7BNormal%7D(%5Ctext%7Bskill%7D_%7Bs-1,i%7Cj%7D,%20%5Csigma_%7B%5Ctext%7Bseasons%7D%7D%5E2)"></p>
<p>and in the <em>first</em> season to:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7Bskill%7D_%7B1,i%7Cj%7D%20%5Csim%20%5Ctext%7BNormal%7D(%5Cmu_%5Ctext%7Bteams%7D,%20%5Csigma_%7B%5Ctext%7Bteams%7D%7D%5E2)"></p>
<p>with the <code>sigma-seasons</code> defined as above and <code>mu-teams</code> and <code>sigma-teams</code> defined as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cmu_%5Ctext%7Bteams%7D%20%5Csim%20%5Ctext%7BNormal%7D(0,%204%5E2)"></p>
<p><img src="https://latex.codecogs.com/png.latex?%5Csigma_%5Ctext%7Bteams%7D%20%5Csim%20%5Ctext%7BUniform%7D(0,%203)"></p>
<p>We apply a normalizing restriction with respect to the (arbitrarily chosen) <em>first</em> team in each season <img src="https://latex.codecogs.com/png.latex?s"> as follows</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7Bskill%7D_%7Bs,1%7D%20=%200"></p>
<p>We choose very vague priors. For example, the prior on the baseline have a SD of 4 but since this is on the log scale of the mean number of goals it corresponds to one SD from the mean <img src="https://latex.codecogs.com/png.latex?0"> covering the range of <img src="https://latex.codecogs.com/png.latex?%5B0.02,%2054.6%5D"> goals. A very wide prior indeed.</p>
</section>
</section>
<section id="probabilistic-graphical-model" class="level4">
<h4 class="anchored" data-anchor-id="probabilistic-graphical-model">Probabilistic Graphical Model</h4>
<p>We graphed the dependencies described above with the help of a probabilistic graphical model. To this end, we make use of the CRAN package <a href="https://cran.r-project.org/web/packages/DiagrammeR/index.html">DiagrammeR</a> with the help of which you can use the <a href="https://graphviz.gitlab.io/">Graph Visualization Software</a>.</p>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/pgm-1.png" class="img-fluid"></p>
</section>
</section>
<section id="read-data" class="level3">
<h3 class="anchored" data-anchor-id="read-data">Read data</h3>
<p>We first read the Dutch soccer match data of the 2014/15-2018/19 Dutch Eredivisie seasons from the original csv-files and cache them. The result is a database called <code>eredivisie</code>.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">from_year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2014</span></span>
<span id="cb1-2">to_year <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2019</span></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">source</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"functions/Import_Data_Eredivisie.R"</span>))</span></code></pre></div></div>
</div>
<p>Then the cached <code>eredivisie</code> data are cleaned and new variables are created.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">eredivisie <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> eredivisie <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">MatchResult =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sign</span>(HomeGoals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> AwayGoals)) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># -1 Away win, 0 Draw, 1 Home win</span></span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Creating a data frame d with only the complete match results</span></span>
<span id="cb2-5">d <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">na.omit</span>(eredivisie)</span>
<span id="cb2-6"></span>
<span id="cb2-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Lists with the unique names of all teams and all seasons in the database</span></span>
<span id="cb2-8">teams <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeTeam, d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>AwayTeam))</span>
<span id="cb2-9">seasons <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Season)</span>
<span id="cb2-10"></span>
<span id="cb2-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># A list for JAGS with the data from d where the strings are coded as integers</span></span>
<span id="cb2-12">data_list <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">HomeGoals =</span> d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeGoals, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">AwayGoals =</span> d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>AwayGoals, </span>
<span id="cb2-13">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">HomeTeam =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeTeam, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels=</span>teams)),</span>
<span id="cb2-14">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">AwayTeam =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>AwayTeam, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels=</span>teams)),</span>
<span id="cb2-15">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Season =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">factor</span>(d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Season, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">levels=</span>seasons)),</span>
<span id="cb2-16">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_teams =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(teams), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_games =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(d), </span>
<span id="cb2-17">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_seasons =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(seasons))</span></code></pre></div></div>
</div>
<p>The data set <code>eredivisie</code> contains data from 5 different seasons. In this model we allow for variability in year-to-year team performance. This variablitity in team performance can be demonstrated by the following diagram, which shows that some teams do not even participate in all seasons in the <code>eredivisie</code> data set, as a result of dropping out of the first division:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qplot</span>(Season, HomeTeam, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data=</span>d, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylab=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Team"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Season"</span>)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/participation_by_season.png" class="img-fluid" width="630"></p>
</section>
</section>
<section id="estimation-simulation-and-validation" class="level2 page-columns page-full">
<h2 class="anchored" data-anchor-id="estimation-simulation-and-validation">Estimation, simulation and validation</h2>
<section id="model-estimation" class="level3 page-columns page-full">
<h3 class="anchored" data-anchor-id="model-estimation">Model estimation</h3>
<p>Turning this into a JAGS model results in the following string. Note that the model loops over all seasons and all match results. JAGS parameterizes the normal distribution with precision (the reciprocal of the variance) instead of variance so the hyper priors have to be converted. Finally, we “anchor” the skill of one team to a constant otherwise the mean skill can drift away freely. Doing these adjustments results in the following model description:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">m3_string <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"model {</span></span>
<span id="cb4-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">for(i in 1:n_games) {</span></span>
<span id="cb4-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">HomeGoals[i] ~ dpois(lambda_home[Season[i], HomeTeam[i],AwayTeam[i]])</span></span>
<span id="cb4-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">AwayGoals[i] ~ dpois(lambda_away[Season[i], HomeTeam[i],AwayTeam[i]])</span></span>
<span id="cb4-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb4-6"></span>
<span id="cb4-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">for(season_i in 1:n_seasons) {</span></span>
<span id="cb4-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">for(home_i in 1:n_teams) {</span></span>
<span id="cb4-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">for(away_i in 1:n_teams) {</span></span>
<span id="cb4-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">lambda_home[season_i, home_i, away_i] &lt;- exp( home_baseline[season_i] + skill[season_i, home_i] - skill[season_i, away_i])</span></span>
<span id="cb4-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">lambda_away[season_i, home_i, away_i] &lt;- exp( away_baseline[season_i] + skill[season_i, away_i] - skill[season_i, home_i])</span></span>
<span id="cb4-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb4-13"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb4-14"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb4-15"></span>
<span id="cb4-16"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">skill[1, 1] &lt;- 0 </span></span>
<span id="cb4-17"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">for(j in 2:n_teams) {</span></span>
<span id="cb4-18"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">skill[1, j] ~ dnorm(group_skill, group_tau)</span></span>
<span id="cb4-19"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb4-20"></span>
<span id="cb4-21"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">group_skill ~ dnorm(0, 0.0625)</span></span>
<span id="cb4-22"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">group_tau &lt;- 1/pow(group_sigma, 2)</span></span>
<span id="cb4-23"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">group_sigma ~ dunif(0, 3)</span></span>
<span id="cb4-24"></span>
<span id="cb4-25"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">home_baseline[1] ~ dnorm(0, 0.0625)</span></span>
<span id="cb4-26"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">away_baseline[1] ~ dnorm(0, 0.0625)</span></span>
<span id="cb4-27"></span>
<span id="cb4-28"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">for(season_i in 2:n_seasons) {</span></span>
<span id="cb4-29"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">skill[season_i, 1] &lt;- 0 </span></span>
<span id="cb4-30"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">for(j in 2:n_teams) {</span></span>
<span id="cb4-31"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">skill[season_i, j] ~ dnorm(skill[season_i - 1, j], season_tau)</span></span>
<span id="cb4-32"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb4-33"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">home_baseline[season_i] ~ dnorm(home_baseline[season_i - 1], season_tau)</span></span>
<span id="cb4-34"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">away_baseline[season_i] ~ dnorm(away_baseline[season_i - 1], season_tau)</span></span>
<span id="cb4-35"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb4-36"></span>
<span id="cb4-37"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">season_tau &lt;- 1/pow(season_sigma, 2) </span></span>
<span id="cb4-38"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">season_sigma ~ dunif(0, 3) </span></span>
<span id="cb4-39"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">}"</span></span></code></pre></div></div>
</div>
<p>We then run this model directly from R using RJAGS and the <code>textConnection</code> function.<sup>1</sup> This takes about half an hour on my computer, but of course this depends on the configuration at hand.</p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup>&nbsp;Here the code is shown that was used at the time of writing the original version of this blog post. The latest version includes the feature of parallel computing and can be found <a href="https://github.com/pjastam/r-bayesian-football-odds">in my GitHub repository</a>.</p></div></div><div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Compiling the model</span></span>
<span id="cb5-2">m3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">jags.model</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">textConnection</span>(m3_string), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data=</span>data_list, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n.chains=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n.adapt=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>)</span>
<span id="cb5-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Burning some samples on the altar of the MCMC god</span></span>
<span id="cb5-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">update</span>(m3, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>)</span>
<span id="cb5-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generating MCMC samples</span></span>
<span id="cb5-6">s3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coda.samples</span>(m3, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">variable.names=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"home_baseline"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"away_baseline"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skill"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"season_sigma"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"group_sigma"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"group_skill"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n.iter=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40000</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">thin=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>)</span>
<span id="cb5-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Merging the three MCMC chains into one matrix</span></span>
<span id="cb5-8">ms3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>(s3)</span></code></pre></div></div>
</div>
<p>The following graphs shows the trace plots and probability distributions of the team mean, team sigma and season sigma parameters, respectively.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(s3[, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"group_skill"</span>])</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/mu_sigma_params-1.png" class="img-fluid" width="810"></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(s3[, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"group_sigma"</span>])</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/mu_sigma_params-2.png" class="img-fluid" width="810"></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot</span>(s3[, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"season_sigma"</span>])</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/mu_sigma_params-3.png" class="img-fluid" width="810"></p>
<p>We can also calculate the default home advantage by looking at the difference between <code>exp(home_baseline) - exp(away_baseline)</code>. The next graph shows that there is a home advantage of more than 0.4 goals, on average, and it differs significantly from zero.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plotPost</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(ms3[,<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_name</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"home_baseline"</span>,to_year<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>from_year)]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(ms3[,<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_name</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"away_baseline"</span>,to_year<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>from_year)]), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">compVal =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Home advantage in number of goals"</span>)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/overall_home_advantage-1.png" class="img-fluid" width="810"></p>
<pre><code>###                                        mean    median      mode hdiMass
### Home advantage in number of goals 0.4288232 0.4248015 0.4162575    0.95
###                                      hdiLow   hdiHigh compVal pcGTcompVal
### Home advantage in number of goals 0.2880692 0.6003385       0           1
###                                   ROPElow ROPEhigh pcInROPE
### Home advantage in number of goals      NA       NA       NA</code></pre>
</section>
<section id="generating-predictions-in--and-out-of-sample" class="level3">
<h3 class="anchored" data-anchor-id="generating-predictions-in--and-out-of-sample">Generating predictions (in- and out-of-sample)</h3>
<p>In the <code>eredivisie</code> data set included in this project, the results of the 18 last games of the 2018/2019 season are missing. Using our model we can now both predict and simulate the outcomes of these 18 games. The R code below calculates a number of measures for each game (both the games with known and unknown outcomes):</p>
<ul>
<li>The mode of the simulated number of goals, that is, the <em>most likely</em> number of scored goals. If we were asked to bet on the number of goals in a game this is what we would use.</li>
<li>The mean of the simulated number of goals, this is our best guess of the average number of goals in a game.</li>
<li>The most likely match result for each game.</li>
<li>A random sample from the distributions of credible home scores, away scores and match results. This is how the Eredivisie actually could have played out in an alternative reality.</li>
</ul>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(ms3)</span>
<span id="cb11-2">m3_pred <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sapply</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(eredivisie), <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(i) {</span>
<span id="cb11-3">  home_team <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which</span>(teams <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeTeam[i])</span>
<span id="cb11-4">  away_team <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which</span>(teams <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>AwayTeam[i])</span>
<span id="cb11-5">  season <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which</span>(seasons <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>Season[i])</span>
<span id="cb11-6">  home_skill <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ms3[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_name</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skill"</span>, season, home_team)] </span>
<span id="cb11-7">  away_skill <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ms3[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_name</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skill"</span>, season, away_team)]</span>
<span id="cb11-8">  home_baseline <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ms3[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_name</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"home_baseline"</span>, season)]</span>
<span id="cb11-9">  away_baseline <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ms3[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_name</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"away_baseline"</span>, season)]</span>
<span id="cb11-10">  </span>
<span id="cb11-11">  home_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rpois</span>(n, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(home_baseline <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> home_skill <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> away_skill))</span>
<span id="cb11-12">  away_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rpois</span>(n, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(away_baseline <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> away_skill <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> home_skill))</span>
<span id="cb11-13">  home_goals_table <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">table</span>(home_goals)</span>
<span id="cb11-14">  away_goals_table <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">table</span>(away_goals)</span>
<span id="cb11-15">  match_results <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sign</span>(home_goals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> away_goals)</span>
<span id="cb11-16">  match_results_table <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">table</span>(match_results)</span>
<span id="cb11-17">  </span>
<span id="cb11-18">  mode_home_goal <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(home_goals_table)[ <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which.max</span>(home_goals_table)])</span>
<span id="cb11-19">  mode_away_goal <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(away_goals_table)[ <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which.max</span>(away_goals_table)])</span>
<span id="cb11-20">  match_result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(match_results_table)[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which.max</span>(match_results_table)])</span>
<span id="cb11-21">  rand_i <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq_along</span>(home_goals), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb11-22">  </span>
<span id="cb11-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode_home_goal =</span> mode_home_goal, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mode_away_goal =</span> mode_away_goal, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">match_result =</span> match_result,</span>
<span id="cb11-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_home_goal =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(home_goals), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_away_goal =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(away_goals),</span>
<span id="cb11-25">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rand_home_goal =</span> home_goals[rand_i], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rand_away_goal =</span> away_goals[rand_i],</span>
<span id="cb11-26">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rand_match_result =</span> match_results[rand_i])</span>
<span id="cb11-27">})</span>
<span id="cb11-28">m3_pred <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t</span>(m3_pred)</span></code></pre></div></div>
</div>
</section>
<section id="model-validation" class="level3">
<h3 class="anchored" data-anchor-id="model-validation">Model validation</h3>
<p>First let’s compare the distribution of the actual number of goals in the data with the predicted mode, mean and randomized number of goals for all the games (focusing on the number of goals for the home team).</p>
<p>First the actual distribution of the number of goals for the home teams.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hist</span>(eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeGoals, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks=</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeGoals, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Distribution of the number of goals</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">scored by a home team in a match"</span>,</span>
<span id="cb12-2">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/hist_home_goal-1.png" class="img-fluid" width="360"></p>
<p>This next plot shows the distribution of the modes from the predicted distribution of home goals from each game. That is, what is the most probable outcome, for the home team, in each game.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hist</span>(m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mode_home_goal"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks=</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mode_home_goal"</span>])) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb13-2">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Distribution of predicted most </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">probable score by a home team in</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">a match"</span>,</span>
<span id="cb13-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/mode_home_goal-1.png" class="img-fluid" width="360"></p>
<p>For almost all games the single most likely number of goals is one. Actually, if you know nothing about an Eredivisie game, betting on one goal for the home team is 78 % of the times the best bet.</p>
<p>Let’s instead look at the distribution of the predicted mean number of home goals in each game.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hist</span>(m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean_home_goal"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks=</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean_home_goal"</span>])) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb14-2">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Distribution of predicted mean </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> score by a home team in a match"</span>,</span>
<span id="cb14-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/mean_home_goal-1.png" class="img-fluid" width="360"></p>
<p>For most games the expected number of goals are 2. That is, even if your safest bet is one goal you would expect to see around two goals.</p>
<p>The distribution of the mode and the mean number of goals doesn’t look remotely like the actual number of goals. This was not to be expected, we would however expect the distribution of randomized goals (where for each match the number of goals has been randomly drawn from that match’s predicted home goal distribution) to look similar to the actual number of home goals. Looking at the histogram below, this seems to be the case.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hist</span>(m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rand_home_goal"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks=</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rand_home_goal"</span>])) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb15-2">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Distribution of randomly drawn </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> score by a home team in a match"</span>,</span>
<span id="cb15-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/rand_home_goal-1.png" class="img-fluid" width="360"></p>
<p>We can also look at how well the model predicts the data. This should probably be done using cross validation, but as the number of effective parameters are much smaller than the number of data points a direct comparison should at least give an estimated prediction accuracy in the right ballpark.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeGoals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mode_home_goal"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span>T)</span></code></pre></div></div>
</div>
<pre><code>### [1] 0.3150232</code></pre>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>((eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeGoals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean_home_goal"</span>])<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span>T)</span></code></pre></div></div>
</div>
<pre><code>### [1] 1.509597</code></pre>
<p>So on average the model predicts the correct number of home goals 31% of the time and guesses the average number of goals with a mean squared error of 1.51. Now we’ll look at the actual and predicted match outcomes. The graph below shows the match outcomes in the data with 1 being a home win, 0 being a draw and -1 being a win for the away team.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hist</span>(eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>MatchResult, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks=</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Actual match results"</span>,</span>
<span id="cb20-2">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/hist_actual_match_result-1.png" class="img-fluid" width="360"></p>
<p>Now looking at the most probable outcomes of the matches according to the model.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hist</span>(m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"match_result"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks=</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Predicted match results"</span>,</span>
<span id="cb21-2">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/hist_pred_match_result-1.png" class="img-fluid" width="360"></p>
<p>For almost all matches the safest bet is to bet on the home team. While draws are not uncommon it is <em>never</em> the safest bet.</p>
<p>As in the case with the number of home goals, the randomized match outcomes have a distribution similar to the actual match outcomes:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hist</span>(m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rand_match_result"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks=</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">main =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Randomized match results"</span>,</span>
<span id="cb22-2">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/hist_rand_match_result-1.png" class="img-fluid" width="360"></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>MatchResult <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> m3_pred[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"match_result"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm=</span>T)</span></code></pre></div></div>
</div>
<pre><code>### [1] 0.563865</code></pre>
<p>The model predicts the correct match outcome (i.e.&nbsp;home team wins / a draw / away team wins) 57% of the time. Pretty good!</p>
</section>
</section>
<section id="results" class="level2">
<h2 class="anchored" data-anchor-id="results">Results</h2>
<section id="the-ranking-of-the-teams" class="level3">
<h3 class="anchored" data-anchor-id="the-ranking-of-the-teams">The ranking of the teams</h3>
<p>We’ll start by ranking the teams of the <code>Eredivisie</code> using the estimated skill parameters from the 2018/2019 season, which are based on the estimation sample of the five seasons 2014/2015-2018/2019. Note that for one of the teams the skill parameter is “anchored at zero”. This “anchoring” is done for the very same “identification” reason that one of the parameters in a traditional logit analysis is always set to zero by default: the value of a parameter automatically follows if you already know all the other parameters in your model and given the fact that probabilities always sum up to 1 in total.</p>
<p>Consequently, as Rasmus noted before, the skill parameters are difficult to interpret as they are relative to the skill of the team that had its skill parameter “anchored” at zero. To put them on a more interpretable scale the skill paramters are first zero centered by subtracting the mean skill of all teams. Then he added the home baseline and exponentiated the resulting values. These rescaled skill parameters are now on the scale of expected number of goals when playing as a home team.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">team_skill <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ms3[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">string=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colnames</span>(ms3), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skill\["</span>,to_year<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>from_year,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">","</span>))]</span>
<span id="cb25-2">team_skill <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> (team_skill <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowMeans</span>(team_skill)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> ms3[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"home_baseline["</span>,to_year<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>from_year,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"]"</span>)]</span>
<span id="cb25-3">team_skill <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(team_skill)</span>
<span id="cb25-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colnames</span>(team_skill) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> teams</span>
<span id="cb25-5">team_skill <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> team_skill[,<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">order</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colMeans</span>(team_skill), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decreasing=</span>T)]</span>
<span id="cb25-6">old_par <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">par</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mar=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>,<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xaxs=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'i'</span>)</span>
<span id="cb25-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">caterplot</span>(team_skill, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels.loc=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"above"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">val.lim=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.7</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3.8</span>))</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/team_skill-1.png" class="img-fluid" width="810"></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">par</span>(old_par)</span></code></pre></div></div>
</div>
<p>Two teams are clearly ahead of the rest, Ajax and PSV. Let’s look at the credible difference between these two teams. Ajax is a better team than PSV with a probabilty of 74%, i.e.&nbsp;the odds in favor of Ajax are 74% / 26% = 3. So, on average, PSV only wins one out of four games that they play against Ajax.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plotPost</span>(team_skill[, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Ajax"</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> team_skill[, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PSV Eindhoven"</span>], <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">compVal =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlab =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;- PSV     vs     Ajax -&gt;"</span>)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/team_skill_PSV_Ajax-1.png" class="img-fluid" width="630"></p>
<pre><code>###                                mean    median      mode hdiMass     hdiLow
### &lt;- PSV     vs     Ajax -&gt; 0.1616095 0.1535391 0.1467396    0.95 -0.3586123
###                             hdiHigh compVal pcGTcompVal ROPElow ROPEhigh
### &lt;- PSV     vs     Ajax -&gt; 0.6842824       0      0.7312      NA       NA
###                           pcInROPE
### &lt;- PSV     vs     Ajax -&gt;       NA</code></pre>
</section>
<section id="predicting-the-future" class="level3">
<h3 class="anchored" data-anchor-id="predicting-the-future">Predicting the future</h3>
<p>Now that we’ve checked that the model reasonably predicts the Eredivisie history let’s predict the Eredivisie endgame!</p>
<p>At the time when I executed my version of this model applied to the Dutch Eredivisie competition (2019-05-10), most of the matches in the 2018/2019 season had already been played. Yet two out of 34 competition rounds had to be played (that is, competition rounds 33 and 34). With these two rounds still to go, Ajax and PSV both have 80 points, but Ajax leads the competition as their goal difference is larger (111-30 = 81) than that of PSV (95-24 = 71). The code below displays the predicted mean and mode number of goals for the endgame and the predicted winner of each game.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">eredivisie_forecast <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> eredivisie[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeGoals), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Season"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Week"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HomeTeam"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AwayTeam"</span>)]</span>
<span id="cb29-2">m3_forecast <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> m3_pred[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeGoals),] </span>
<span id="cb29-3">eredivisie_forecast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mean_home_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(m3_forecast[,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean_home_goal"</span>], <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) </span>
<span id="cb29-4">eredivisie_forecast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mean_away_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(m3_forecast[,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean_away_goal"</span>], <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb29-5">eredivisie_forecast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mode_home_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> m3_forecast[,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mode_home_goal"</span>] </span>
<span id="cb29-6">eredivisie_forecast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>mode_away_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> m3_forecast[,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mode_away_goal"</span>]</span>
<span id="cb29-7">eredivisie_forecast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>predicted_winner <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(m3_forecast[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"match_result"</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, eredivisie_forecast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeTeam, </span>
<span id="cb29-8">                                           <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(m3_forecast[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"match_result"</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, eredivisie_forecast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>AwayTeam, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Draw"</span>))</span>
<span id="cb29-9"></span>
<span id="cb29-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rownames</span>(eredivisie_forecast) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb29-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xtable</span>(eredivisie_forecast, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cccccccccc"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"html"</span>)</span></code></pre></div></div>
</div>
<table class="caption-top table">
<colgroup>
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
<col style="width: 10%">
</colgroup>
<thead>
<tr class="header">
<th></th>
<th style="text-align: center;">Season</th>
<th style="text-align: center;">Week</th>
<th style="text-align: center;">HomeTeam</th>
<th style="text-align: center;">AwayTeam</th>
<th style="text-align: center;">mean_home_goals</th>
<th style="text-align: center;">mean_away_goals</th>
<th style="text-align: center;">mode_home_goals</th>
<th style="text-align: center;">mode_away_goals</th>
<th style="text-align: center;">predicted_winner</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Ajax</td>
<td style="text-align: center;">Utrecht</td>
<td style="text-align: center;">2.90</td>
<td style="text-align: center;">0.80</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">Ajax</td>
</tr>
<tr class="even">
<td>2</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">AZ Alkmaar</td>
<td style="text-align: center;">PSV Eindhoven</td>
<td style="text-align: center;">1.30</td>
<td style="text-align: center;">1.90</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">PSV Eindhoven</td>
</tr>
<tr class="odd">
<td>3</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Groningen</td>
<td style="text-align: center;">For Sittard</td>
<td style="text-align: center;">2.10</td>
<td style="text-align: center;">1.10</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Groningen</td>
</tr>
<tr class="even">
<td>4</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Feyenoord</td>
<td style="text-align: center;">Den Haag</td>
<td style="text-align: center;">2.70</td>
<td style="text-align: center;">0.90</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">Feyenoord</td>
</tr>
<tr class="odd">
<td>5</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Heerenveen</td>
<td style="text-align: center;">NAC Breda</td>
<td style="text-align: center;">2.20</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">Heerenveen</td>
</tr>
<tr class="even">
<td>6</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Vitesse</td>
<td style="text-align: center;">Graafschap</td>
<td style="text-align: center;">2.60</td>
<td style="text-align: center;">0.90</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">Vitesse</td>
</tr>
<tr class="odd">
<td>7</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Willem II</td>
<td style="text-align: center;">FC Emmen</td>
<td style="text-align: center;">2.10</td>
<td style="text-align: center;">1.10</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Willem II</td>
</tr>
<tr class="even">
<td>8</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Zwolle</td>
<td style="text-align: center;">VVV Venlo</td>
<td style="text-align: center;">1.90</td>
<td style="text-align: center;">1.20</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Zwolle</td>
</tr>
<tr class="odd">
<td>9</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Heracles</td>
<td style="text-align: center;">Excelsior</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">1.10</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Heracles</td>
</tr>
<tr class="even">
<td>10</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Den Haag</td>
<td style="text-align: center;">Willem II</td>
<td style="text-align: center;">1.70</td>
<td style="text-align: center;">1.30</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Den Haag</td>
</tr>
<tr class="odd">
<td>11</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Graafschap</td>
<td style="text-align: center;">Ajax</td>
<td style="text-align: center;">0.80</td>
<td style="text-align: center;">3.00</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">Ajax</td>
</tr>
<tr class="even">
<td>12</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Utrecht</td>
<td style="text-align: center;">Heerenveen</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">1.20</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Utrecht</td>
</tr>
<tr class="odd">
<td>13</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">NAC Breda</td>
<td style="text-align: center;">Zwolle</td>
<td style="text-align: center;">1.40</td>
<td style="text-align: center;">1.70</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Zwolle</td>
</tr>
<tr class="even">
<td>14</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">PSV Eindhoven</td>
<td style="text-align: center;">Heracles</td>
<td style="text-align: center;">3.10</td>
<td style="text-align: center;">0.70</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">PSV Eindhoven</td>
</tr>
<tr class="odd">
<td>15</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">FC Emmen</td>
<td style="text-align: center;">Groningen</td>
<td style="text-align: center;">1.30</td>
<td style="text-align: center;">1.70</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Groningen</td>
</tr>
<tr class="even">
<td>16</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Excelsior</td>
<td style="text-align: center;">AZ Alkmaar</td>
<td style="text-align: center;">1.20</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">AZ Alkmaar</td>
</tr>
<tr class="odd">
<td>17</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">For Sittard</td>
<td style="text-align: center;">Feyenoord</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">2.30</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">Feyenoord</td>
</tr>
<tr class="even">
<td>18</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">VVV Venlo</td>
<td style="text-align: center;">Vitesse</td>
<td style="text-align: center;">1.30</td>
<td style="text-align: center;">1.70</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Vitesse</td>
</tr>
</tbody>
</table>
<p>These predictions are perfectly useful if you want to bet on the likely winner of each game. However, they do not reflect how the actual endgame will play out, e.g., there is not a single draw in the <code>predicted_winner</code> column. So at last let’s look at a <em>possible</em> version of the Eredivisie endgame by displaying the simulated match results calculated earlier.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">eredivisie_sim <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> eredivisie[<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(eredivisie<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeGoals), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Season"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Week"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HomeTeam"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AwayTeam"</span>)]</span>
<span id="cb30-2">eredivisie_sim<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>home_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> m3_forecast[,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rand_home_goal"</span>] </span>
<span id="cb30-3">eredivisie_sim<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>away_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> m3_forecast[,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rand_away_goal"</span>]</span>
<span id="cb30-4">eredivisie_sim<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>winner <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(m3_forecast[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rand_match_result"</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, eredivisie_forecast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>HomeTeam, </span>
<span id="cb30-5">                            <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(m3_forecast[ , <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rand_match_result"</span>] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, eredivisie_forecast<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>AwayTeam, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Draw"</span>))</span>
<span id="cb30-6"></span>
<span id="cb30-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rownames</span>(eredivisie_sim) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb30-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xtable</span>(eredivisie_sim, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cccccccc"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"html"</span>)</span></code></pre></div></div>
</div>
<table class="caption-top table">
<colgroup>
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
</colgroup>
<thead>
<tr class="header">
<th></th>
<th style="text-align: center;">Season</th>
<th style="text-align: center;">Week</th>
<th style="text-align: center;">HomeTeam</th>
<th style="text-align: center;">AwayTeam</th>
<th style="text-align: center;">home_goals</th>
<th style="text-align: center;">away_goals</th>
<th style="text-align: center;">winner</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Ajax</td>
<td style="text-align: center;">Utrecht</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Draw</td>
</tr>
<tr class="even">
<td>2</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">AZ Alkmaar</td>
<td style="text-align: center;">PSV Eindhoven</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">PSV Eindhoven</td>
</tr>
<tr class="odd">
<td>3</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Groningen</td>
<td style="text-align: center;">For Sittard</td>
<td style="text-align: center;">3.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Groningen</td>
</tr>
<tr class="even">
<td>4</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Feyenoord</td>
<td style="text-align: center;">Den Haag</td>
<td style="text-align: center;">4.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Feyenoord</td>
</tr>
<tr class="odd">
<td>5</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Heerenveen</td>
<td style="text-align: center;">NAC Breda</td>
<td style="text-align: center;">3.00</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">Heerenveen</td>
</tr>
<tr class="even">
<td>6</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Vitesse</td>
<td style="text-align: center;">Graafschap</td>
<td style="text-align: center;">4.00</td>
<td style="text-align: center;">3.00</td>
<td style="text-align: center;">Vitesse</td>
</tr>
<tr class="odd">
<td>7</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Willem II</td>
<td style="text-align: center;">FC Emmen</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">Draw</td>
</tr>
<tr class="even">
<td>8</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Zwolle</td>
<td style="text-align: center;">VVV Venlo</td>
<td style="text-align: center;">4.00</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">Zwolle</td>
</tr>
<tr class="odd">
<td>9</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Heracles</td>
<td style="text-align: center;">Excelsior</td>
<td style="text-align: center;">4.00</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">Heracles</td>
</tr>
<tr class="even">
<td>10</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Den Haag</td>
<td style="text-align: center;">Willem II</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Draw</td>
</tr>
<tr class="odd">
<td>11</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Graafschap</td>
<td style="text-align: center;">Ajax</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Ajax</td>
</tr>
<tr class="even">
<td>12</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Utrecht</td>
<td style="text-align: center;">Heerenveen</td>
<td style="text-align: center;">5.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">Utrecht</td>
</tr>
<tr class="odd">
<td>13</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">NAC Breda</td>
<td style="text-align: center;">Zwolle</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">1.00</td>
<td style="text-align: center;">NAC Breda</td>
</tr>
<tr class="even">
<td>14</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">PSV Eindhoven</td>
<td style="text-align: center;">Heracles</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">Draw</td>
</tr>
<tr class="odd">
<td>15</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">FC Emmen</td>
<td style="text-align: center;">Groningen</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">2.00</td>
<td style="text-align: center;">Groningen</td>
</tr>
<tr class="even">
<td>16</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">Excelsior</td>
<td style="text-align: center;">AZ Alkmaar</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">3.00</td>
<td style="text-align: center;">AZ Alkmaar</td>
</tr>
<tr class="odd">
<td>17</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">For Sittard</td>
<td style="text-align: center;">Feyenoord</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">3.00</td>
<td style="text-align: center;">Feyenoord</td>
</tr>
<tr class="even">
<td>18</td>
<td style="text-align: center;">2018/2019</td>
<td style="text-align: center;">40</td>
<td style="text-align: center;">VVV Venlo</td>
<td style="text-align: center;">Vitesse</td>
<td style="text-align: center;">3.00</td>
<td style="text-align: center;">0.00</td>
<td style="text-align: center;">VVV Venlo</td>
</tr>
</tbody>
</table>
<p>Now we see a number of games resulting in a draw. We also see that Ajax and FC Utrecht tie in round 33, which puts PSV on top of the leaderboard! However, in round 34 the image is reversed when PSV and Heracles tie, against all odds. So, in the end, Ajax wins the competition in this <em>possible</em> version of the Eredivisie endgame by their better goal difference.</p>
</section>
<section id="betting-on-the-match-outcome" class="level3">
<h3 class="anchored" data-anchor-id="betting-on-the-match-outcome">Betting on the match outcome</h3>
<p>One of the powers with using Bayesian modeling and MCMC sampling is that once you have the MCMC samples of the parameters it is straight forward to calculate any quantity resulting from these estimates while still retaining the uncertainty of the parameter estimates. So let’s look at the predicted distribution of the number of goals for AZ Alkmaar vs PSV Eindhoven game and see if I can use my model to make some money. I’ll start by using the MCMC samples to calculate the distribution of the number of goals for AZ Alkmaar and PSV Eindhoven.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1">plot_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(home_goals, away_goals) { </span>
<span id="cb31-2">  old_par <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">par</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mar =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>))</span>
<span id="cb31-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">par</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mfrow =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mar=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb31-4">  </span>
<span id="cb31-5">    n_matches <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(home_goals) </span>
<span id="cb31-6">    goal_diff <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> home_goals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> away_goals </span>
<span id="cb31-7">    match_result <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(goal_diff <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"away_win"</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(goal_diff <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"home_win"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"equal"</span>)) </span>
<span id="cb31-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hist</span>(home_goals, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks =</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb31-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hist</span>(away_goals, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks =</span> (<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>) </span>
<span id="cb31-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hist</span>(goal_diff, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xlim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks =</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>)</span>
<span id="cb31-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">barplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">table</span>(match_result)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>n_matches, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span>
<span id="cb31-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">par</span>(old_par)</span>
<span id="cb31-13">}</span></code></pre></div></div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(ms3)</span>
<span id="cb32-2">home_team <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which</span>(teams <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AZ Alkmaar"</span>)</span>
<span id="cb32-3">away_team <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which</span>(teams <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PSV Eindhoven"</span>)</span>
<span id="cb32-4">season <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">which</span>(seasons <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(to_year<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">-1</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/"</span>,to_year))</span>
<span id="cb32-5">home_skill <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ms3[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_name</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skill"</span>, season, home_team)] </span>
<span id="cb32-6">away_skill <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ms3[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_name</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"skill"</span>, season, away_team)]</span>
<span id="cb32-7">home_baseline <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ms3[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_name</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"home_baseline"</span>, season)]</span>
<span id="cb32-8">away_baseline <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ms3[, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_name</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"away_baseline"</span>, season)]</span>
<span id="cb32-9"></span>
<span id="cb32-10">home_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rpois</span>(n, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(home_baseline <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> home_skill <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> away_skill))</span>
<span id="cb32-11">away_goals <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rpois</span>(n, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(away_baseline <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> away_skill <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> home_skill))</span></code></pre></div></div>
</div>
<p>Looking at summary of these two distributions in the first two graphs below, it shows that AZ and PSV both have the biggest chance to score one goal (as the modus of both distributions equals 1). From the third graph it follows that the most likely goal difference is 0 or -1: either AZ and PSV draw (0), or PSV scores just one more goal than AZ (-1). In case of the latter, PSV turns out to be the match winner.</p>
<p>The fourth graph shows the probability distribution of a PSV win (‘away_win’), a draw (‘equal’) and AZ win (‘home_win’). This graph underlines that a PSV win is a likely scenario: it has a probability of more than 50%. The fact that the balance topples in favor of PSV should then be due to the one goal difference that is attributed a great chance according to the third graph. Note, however, that the probability that PSV will <em>not</em> turn out as the match winner (i.e.&nbsp;a draw or a loss) is still almost 50%.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1">old_par <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">par</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mfrow =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mar=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span>
<span id="cb33-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_goals</span>(home_goals, away_goals)</span></code></pre></div></div>
</div>
<p><img src="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/results/plot_goals-1.png" class="img-fluid" width="810"></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">par</span>(old_par)</span></code></pre></div></div>
</div>
<p>At May 10th, that is just before the start of competition round 33, you got the following payouts (that is, how much would I get back if my bet was successful) for betting on the outcome of this game, after 288 bets being placed on the betting site <a href="http://www.williamhill.com/">William Hill</a></p>
<table class="caption-top table">
<thead>
<tr class="header">
<th style="text-align: center;">AZ</th>
<th style="text-align: center;">Draw</th>
<th style="text-align: right;">PSV</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">3.90</td>
<td style="text-align: center;">4.00</td>
<td style="text-align: right;">1.78</td>
</tr>
</tbody>
</table>
<p>Using my simulated distribution of the number of goals I can calculate the predicted payouts of the model. It appears that the payouts of the model are very close to the payouts that William Hill offers.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">AZ =</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(home_goals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> away_goals), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Draw =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(home_goals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> away_goals), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">PSV =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(home_goals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> away_goals))</span></code></pre></div></div>
</div>
<pre><code>###       AZ     Draw      PSV 
### 3.928759 4.332756 1.943005</code></pre>
<p>The most likely result is 1 - 1 with a predicted payout of 9.70, which can be compared to the William Hill payout of 7.50 for this bet. Thus, William Hill thinks that a 1 - 1 draw is even likier than our model predicts. If we want to earn some extra money, we should bet on a 1 - 0 win for AZ, as the William Hill payout is 19 and our model predicts 17.50.</p>
</section>
<section id="betting-on-the-correct-score" class="level3">
<h3 class="anchored" data-anchor-id="betting-on-the-correct-score">Betting on the correct score</h3>
<p>It is also possible to bet on the final goal outcome so let’s calculate what payouts my model predicts for different goal outcomes. The payouts that William Hill reports are</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th></th>
<th style="text-align: center;">PSV 0</th>
<th style="text-align: center;">PSV 1</th>
<th style="text-align: center;">PSV 2</th>
<th style="text-align: center;">PSV 3</th>
<th style="text-align: center;">PSV 4</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>AZ 0</td>
<td style="text-align: center;">21</td>
<td style="text-align: center;">12</td>
<td style="text-align: center;">12</td>
<td style="text-align: center;">17</td>
<td style="text-align: center;">29</td>
</tr>
<tr class="even">
<td>AZ 1</td>
<td style="text-align: center;">19</td>
<td style="text-align: center;">7.5</td>
<td style="text-align: center;">12</td>
<td style="text-align: center;">12</td>
<td style="text-align: center;">21</td>
</tr>
<tr class="odd">
<td>AZ 2</td>
<td style="text-align: center;">23</td>
<td style="text-align: center;">13</td>
<td style="text-align: center;">11</td>
<td style="text-align: center;">17</td>
<td style="text-align: center;">29</td>
</tr>
<tr class="even">
<td>AZ 3</td>
<td style="text-align: center;">41</td>
<td style="text-align: center;">26</td>
<td style="text-align: center;">23</td>
<td style="text-align: center;">29</td>
<td style="text-align: center;">51</td>
</tr>
<tr class="odd">
<td>AZ 4</td>
<td style="text-align: center;">81</td>
<td style="text-align: center;">51</td>
<td style="text-align: center;">66</td>
<td style="text-align: center;">126</td>
<td style="text-align: center;">81</td>
</tr>
</tbody>
</table>
<p>It follows that the 1 - 1 draw is also the most likely result at Wiliam Hill. Now, we are going to calculate the payouts that our model predicts.</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1">goals_payout <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">laply</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(home_goal) {</span>
<span id="cb37-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">laply</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(away_goal) {</span>
<span id="cb37-3">    <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(home_goals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> home_goal <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> away_goals  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> away_goal)</span>
<span id="cb37-4">  })</span>
<span id="cb37-5">})</span>
<span id="cb37-6"></span>
<span id="cb37-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">colnames</span>(goals_payout) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"PSV Eindhoven"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" - "</span>)</span>
<span id="cb37-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rownames</span>(goals_payout) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"AZ Alkmaar"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" - "</span>)</span>
<span id="cb37-9">goals_payout <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(goals_payout, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb37-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">xtable</span>(goals_payout, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">align=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cccccccc"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"html"</span>)</span></code></pre></div></div>
</div>
<table class="caption-top table">
<colgroup>
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
</colgroup>
<thead>
<tr class="header">
<th></th>
<th style="text-align: center;">PSV Eindhoven - 0</th>
<th style="text-align: center;">PSV Eindhoven - 1</th>
<th style="text-align: center;">PSV Eindhoven - 2</th>
<th style="text-align: center;">PSV Eindhoven - 3</th>
<th style="text-align: center;">PSV Eindhoven - 4</th>
<th style="text-align: center;">PSV Eindhoven - 5</th>
<th style="text-align: center;">PSV Eindhoven - 6</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>AZ Alkmaar - 0</td>
<td style="text-align: center;">20.70</td>
<td style="text-align: center;">12.20</td>
<td style="text-align: center;">13.00</td>
<td style="text-align: center;">21.20</td>
<td style="text-align: center;">45.20</td>
<td style="text-align: center;">120.00</td>
<td style="text-align: center;">306.10</td>
</tr>
<tr class="even">
<td>AZ Alkmaar - 1</td>
<td style="text-align: center;">18.30</td>
<td style="text-align: center;">9.50</td>
<td style="text-align: center;">10.40</td>
<td style="text-align: center;">16.90</td>
<td style="text-align: center;">36.30</td>
<td style="text-align: center;">94.30</td>
<td style="text-align: center;">238.10</td>
</tr>
<tr class="odd">
<td>AZ Alkmaar - 2</td>
<td style="text-align: center;">29.60</td>
<td style="text-align: center;">15.90</td>
<td style="text-align: center;">16.30</td>
<td style="text-align: center;">27.10</td>
<td style="text-align: center;">54.50</td>
<td style="text-align: center;">168.50</td>
<td style="text-align: center;">625.00</td>
</tr>
<tr class="even">
<td>AZ Alkmaar - 3</td>
<td style="text-align: center;">67.00</td>
<td style="text-align: center;">36.10</td>
<td style="text-align: center;">38.80</td>
<td style="text-align: center;">70.80</td>
<td style="text-align: center;">153.10</td>
<td style="text-align: center;">441.20</td>
<td style="text-align: center;">1363.60</td>
</tr>
<tr class="odd">
<td>AZ Alkmaar - 4</td>
<td style="text-align: center;">208.30</td>
<td style="text-align: center;">112.80</td>
<td style="text-align: center;">135.10</td>
<td style="text-align: center;">214.30</td>
<td style="text-align: center;">454.50</td>
<td style="text-align: center;">1363.60</td>
<td style="text-align: center;">3750.00</td>
</tr>
<tr class="even">
<td>AZ Alkmaar - 5</td>
<td style="text-align: center;">937.50</td>
<td style="text-align: center;">428.60</td>
<td style="text-align: center;">625.00</td>
<td style="text-align: center;">714.30</td>
<td style="text-align: center;">1875.00</td>
<td style="text-align: center;">7500.00</td>
<td style="text-align: center;">15000.00</td>
</tr>
<tr class="odd">
<td>AZ Alkmaar - 6</td>
<td style="text-align: center;">3750.00</td>
<td style="text-align: center;">2142.90</td>
<td style="text-align: center;">7500.00</td>
<td style="text-align: center;">3000.00</td>
<td style="text-align: center;">7500.00</td>
<td style="text-align: center;">Inf</td>
<td style="text-align: center;">Inf</td>
</tr>
</tbody>
</table>
<p>The most likely result is 1 - 1 with a predicted payout of 9.70, which can be compared to the William Hill payout of 7.50 for this bet. This, we can earn some extra money if we bet on this end score.</p>


</section>
</section>


<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{stam2019,
  author = {Stam, Piet},
  title = {Modeling Match Results in the {Dutch} {Eredivisie} Using a
    Hierarchical {Bayesian} {Poisson} Model},
  date = {2019-05-10},
  url = {https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-stam2019" class="csl-entry quarto-appendix-citeas">
Stam, Piet. 2019. <span>“Modeling Match Results in the Dutch Eredivisie
Using a Hierarchical Bayesian Poisson Model.”</span> May 10, 2019. <a href="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/">https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/</a>.
</div></div></section></div> ]]></description>
  <category>sports analytics</category>
  <category>R</category>
  <guid>https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/</guid>
  <pubDate>Fri, 10 May 2019 18:00:00 GMT</pubDate>
  <media:content url="https://www.pietstam.nl/posts/2019-05-10-bayesian-football-odds/feature.png" medium="image" type="image/png" height="154" width="144"/>
</item>
<item>
  <title>Personal Health Train: an application to risk equalization (in Dutch)</title>
  <dc:creator>Piet Stam</dc:creator>
  <link>https://www.pietstam.nl/posts/2019-02-19-personal-health-train-case-risk-equalization/</link>
  <description><![CDATA[ 






<blockquote class="blockquote">
<p>Deze blog van mijn hand heb ik op bovenstaande datum op <a href="https://www.linkedin.com/pulse/personal-health-train-casus-risicoverevening-piet-stam/">LinkedIn</a> gepubliceerd en is hier integraal overgenomen om de openbare beschikbaarheid te garanderen.</p>
</blockquote>
<p>Voor <a href="https://www.rijksoverheid.nl/documenten/brochures/2016/03/01/beschrijving-van-het-risicovereveningssysteem-van-de-zorgverzekeringswet">onderzoek</a> &amp; <a href="https://www.zorginstituutnederland.nl/financiering/risicoverevening-zvw/wat-is-risicoverevening">uitvoering</a> van de risicoverevening tussen zorgverzekeraars wordt het zorggebruik van alle 17 miljoen individuele (!) Nederlanders centraal verzameld in een landelijk databestand. Kan dat ook decentraal? Misschien met de <a href="http://www.personalhealthtrain.nl/">Personal Health Train</a> (PHT), het concept dat Minister Bruins <a href="https://www.rijksoverheid.nl/documenten/kamerstukken/2018/11/15/kamerbrief-over-data-laten-werken-voor-gezondheid">propageert</a> en is ontwikkeld door <a href="https://www.dtls.nl/fair-data/personal-health-train/">DTL</a>, <a href="https://www.linkedin.com/in/alajdekker/">MAASTRO</a> en <a href="https://www.linkedin.com/in/peter-bram-t-hoen-1b6b9a18/">LUMC</a>? Mijn voorlopige conclusie is: helaas (nog?) niet.</p>
<p>De bedoeling is dat de PHT gaat rijden voor zgn. <a href="https://en.wikipedia.org/wiki/Partition_(database)#Partitioning_methods">horizontaal èn verticaal gepartitioneerde databestanden</a>. Het landelijke databestand van de risicoverevening is een voorbeeld van een verticaal gepartitioneerd databestand. Dat betekent zoveel als dat de informatie in de verschillende kolommen uit verschillende databronnen (lees: locaties) komt. Bijv. de inkomensinformatie voor het kenmerk Sociaal-Economische Status (SES) wordt van de Belastingdienst betrokken, terwijl de informatie over de leeftijd van een individu door zijn/haar zorgverzekeraar wordt aangeleverd. Zo kom je tot een centraal databestand met 17 miljoen horizontale records en (o.a.) SES en leeftijd in de kolommen.</p>
<p>De bedoeling van de PHT is het tegenovergestelde van de gebruikelijke werkwijze, waarbij data eerst in een centraal databestand vanuit verschillende bronnen bij elkaar gebracht worden, voordat onderzoekers en algoritmen hun werk kunnen doen. De PHT beoogt juist de data op de bronlocatie te laten staan en de onderzoeker en de algoritmen daar naartoe te brengen. Voor de techneuten onder ons: je kunt een algoritme in een Docker container programmeren en deze op de data op de bronlocatie laten uitvoeren. Voor de niet-techneuten onder ons: stel je een treintje voor (met daarop een algoritme) dat langs de verschillende stations (i.e.&nbsp;bronlocaties) rijdt. Dit is fraai gevisualiseerd in een <a href="https://vimeo.com/143245835">korte video</a>.</p>
<p>Alleen al om privacyredenen zou het prachtig zijn als de PHT faciliteert dat horizontaal èn verticaal gepartitioneerde databestanden op de bronlocatie kunnen blijven staan. Ten aanzien van horizontaal gepartitioneerde databestanden zijn inmiddels prototypes van de PHT <a href="https://www.thegreenjournal.com/article/S0167-8140(16)34336-5/fulltext">beschikbaar</a>, maar ten aanzien van verticaal gepartitioneerde databestanden zit de PHT vooralsnog in <a href="http://www.medra.org/servlet/aliasResolver?alias=iospressISBN&amp;isbn=978-1-61499-851-8&amp;spage=581&amp;doi=10.3233/978-1-61499-852-5-581">de experimentele fase</a>. Het algoritme wordt weliswaar naar de bronlocaties gebracht, maar als de data van verschillende bronlocaties met elkaar gecombineerd moeten worden, dan moeten de data toch even deze bronlocaties (tegelijkertijd) verlaten voordat het algoritme haar werk kan doen. En juist op dat ene moment is er toch weer (even) sprake van een centraal databestand waarop de berekening wordt uitgevoerd. En dat willen we met de PHT idealiter voorkomen.</p>
<p>Een oplossing hiervoor is er nog niet. Er wordt momenteel gedacht over het invoegen van een derde partij, die de data uit de locaties ophaalt en centraal koppelt zonder zelf inzage te krijgen in de data. Maar dat is niet nieuw: dat doet <a href="https://www.zorgttp.nl/">ZorgTTP</a> al langer ten behoeve van het onderzoek en de uitvoering van de risicoverevening middels <a href="https://www.zorgttp.nl/pseudonimisatie/">pseudonimisatie</a>. Ook deze oplossingsrichting laat echter onverlet dat data de bronlocaties (tijdelijk) moeten verlaten voordat we onderzoekers en algoritmen hun werk kunnen laten doen. De werkwijze van ZorgTTP speelt weliswaar al goed in op de aspiratie van de PHT om de data zo dicht mogelijk bij de bronlocaties te verwerken, maar idealiter gebeurt dat op de bronlocaties zelf. Net zoals dat met de prototypes van de PHT bij horizontaal gepartitioneerde databestanden lukt.</p>
<p>De hamvraag is: hoe krijgen we de PHT op volle snelheid aan het rijden voor datavraagstukken zoals de risicoverevening? Als dat lukt, dan zou dit een enorme verbetering en vereenvoudiging betekenen voor de jaarlijkse verwerking van de individuele zorggebruikdata van ons allemaal. Op dinsdag 19 maart leggen <a href="https://www.linkedin.com/in/hans-van-vlaanderen-b2541b3/">Hans van Vlaanderen</a> (<a href="http://www.zorgttp.nl/">ZorgTTP</a>) en ik deze vraag voor in de “Making data work for health” sessie van <a href="https://www.commit-nl.nl/">COMMIT/</a> en <a href="https://www.dtls.nl/">DTL</a> op de <a href="https://ict-research.nl/ict-open/">ICT.OPEN2019</a> conferentie. Samen met aanwezigen gaan we op zoek naar nieuwe oplossingsrichtingen. Want hoe mooi zou het zijn om de PHT ook voor data van de risicoverevening op volle snelheid aan het rijden te krijgen?</p>



<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{stam2019,
  author = {Stam, Piet},
  title = {Personal {Health} {Train:} An Application to Risk
    Equalization (in {Dutch)}},
  date = {2019-02-19},
  url = {https://www.pietstam.nl/posts/2019-02-19-personal-health-train-case-risk-equalization/},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-stam2019" class="csl-entry quarto-appendix-citeas">
Stam, Piet. 2019. <span>“Personal Health Train: An Application to Risk
Equalization (in Dutch).”</span> February 19, 2019. <a href="https://www.pietstam.nl/posts/2019-02-19-personal-health-train-case-risk-equalization/">https://www.pietstam.nl/posts/2019-02-19-personal-health-train-case-risk-equalization/</a>.
</div></div></section></div> ]]></description>
  <category>risk equalization</category>
  <category>digital transformation</category>
  <guid>https://www.pietstam.nl/posts/2019-02-19-personal-health-train-case-risk-equalization/</guid>
  <pubDate>Tue, 19 Feb 2019 00:00:00 GMT</pubDate>
  <media:content url="https://www.pietstam.nl/posts/2019-02-19-personal-health-train-case-risk-equalization/feature_bw.png" medium="image" type="image/png" height="81" width="144"/>
</item>
</channel>
</rss>
