Yens Onboarding – Fall 2025

Welcome! This two-session onboarding will get you productive on Stanford GSB’s research computing cluster — the Yens, and ready to run real research workflows at scale.


Course objectives

By the end of the course, you will be able to:

  • Connect to the Yens via SSH and JupyterHub, navigate the filesystem, and move files to/from the cluster.
  • Create and use a reproducible Python environment, install packages, and register a Jupyter kernel.
  • Submit, monitor, and debug Slurm jobs; choose CPU/RAM/time resources; and read job logs.
  • Scale workloads with fault tolerance (checkpoint/resume) and Slurm arrays for parallelism.

Overview of topics

Day 1 — Getting started (interactive use)

  • SSH & JupyterHub access; basic shell; copying data with scp.
  • Clone the course repo; run scripts from the terminal.
  • Create/activate a virtual environment, install from requirements.txt, and expose your env to Jupyter.
  • Securely use environment variables with dotenv.
  • Motivation exercise: run a “mystery” Python script interactively, watch CPU/RAM/time with htop, userload, and time.
  • Your first Slurm job (hello-world) to close Day 1.

Day 2 — Running and scaling jobs

  • Running real scripts with Slurm: working directories, logs, monitoring with squeue, canceling jobs.
  • Debugging lab: fix common broken Slurm scripts and learn a systematic log-first workflow.
  • Handling resource failures (out-of-memory, time limits) and how to size jobs.
  • Fault tolerance: run a checkpointed batch job that saves progress and resumes after fixes.
  • Scaling up: Slurm arrays to run many tasks in parallel; per-task outputs and follow-up consolidation.
  • Sharing results off-cluster and communicating your workflow (README updates).

Expectations

What we assume

  • You’ve done some programming (Python preferred). If not, tell us and we’ll get you unstuck.
  • You can bring a laptop with Duo for Stanford login and a modern browser.

Setup you’ll complete in class

  • SSH/JupyterHub access and file transfer.
  • Local repo clone and a working venv linked to Jupyter.
  • A first Slurm submission on Day 1; full Slurm workflow on Day 2.

Legend we will use

  • 💻: means “use terminal on the Yens”
  • ✏️ : means “we will white board this”
  • 🐍: means “Python script”
  • ❓: question for class. Feel free to shout out the answer
  • 🟩 / 🟥: means “put up the colored sticky once you finish the exercise / ask for help”

What to do during class

  • Work through each hands-on step in order; put up 🟩 when done, 🟥 for help (we’ll circulate).
  • Ask questions out loud—especially when something fails. We’ll use failures to practice debugging.

How the course is run

  • Format: short demos → guided hands-on exercises → quick recaps.
  • Timing: Two sessions of ~2:45 each (including breaks), with clearly scoped exercises and checkpoints aligned to the Day 1/Day 2 pages.
  • Materials: All commands and scripts live in the course repo you’ll clone on Day 1.

Support & resources

  • Need help beyond class? Check RCpedia and reach us on Slack.

Contact us


The DARC Team

DARC Team September 2024DARC Team at Cantor Art Museum
DARC TeamDARC Team at Art Museum