Hello World: Testing the Pipeline
A demo post that exercises every feature of the markdown pipeline — math, code, tables, and more.
This is a test post to verify the full rendering pipeline. If you can see properly formatted math, syntax-highlighted code, and a table below, everything is working.
Inline and Display Math
Einstein's famous mass-energy equivalence is , and the gradient of a scalar field is .
The Bellman optimality equation for the state-value function under a discounted infinite-horizon MDP is:
And the corresponding action-value form:
Code Blocks
Here's a simple value iteration implementation in Python:
import numpy as np
def value_iteration(P, R, gamma=0.99, theta=1e-8):
V = np.zeros(P.shape[0])
while True:
V_new = np.max(R + gamma * P @ V, axis=1)
if np.max(np.abs(V_new - V)) < theta:
break
V = V_new
# Extract the greedy policy
policy = np.argmax(R + gamma * P @ V, axis=1)
return V, policyAnd a TypeScript utility for reading frontmatter:
import matter from "gray-matter";
import fs from "fs";
interface PostMeta {
title: string;
description?: string;
tags: string[];
}
export function parsePost(filePath: string) {
const raw = fs.readFileSync(filePath, "utf8");
const { data, content } = matter(raw);
return { meta: data as PostMeta, content };
}A Blockquote
The reward hypothesis: all of what we mean by goals and purposes can be well thought of as the maximization of the expected value of the cumulative sum of a received scalar signal (called reward).
— Rich Sutton
Tables
| Algorithm | On/Off-Policy | Model-Free | Continuous Actions |
|---|---|---|---|
| Q-Learning | Off | Yes | No |
| SARSA | On | Yes | No |
| PPO | On | Yes | Yes |
| SAC | Off | Yes | Yes |
| DDPG | Off | Yes | Yes |
Lists
Key components of a typical RL agent:
- A policy mapping states to actions
- A value function or
- Optionally, a model of the environment dynamics
- Transition function
- Reward function
Steps in the policy gradient derivation:
- Define the objective
- Apply the log-derivative trick
- Estimate the gradient with Monte Carlo samples
- Add a baseline to reduce variance
That's everything. If the math renders, the code highlights, and the table aligns, you're good to go.