Designing a Robot Learning Curriculum: Task Sequencing and Difficulty Progression

Why Curriculum Matters

Starting a robot learning program with complex tasks is the most common — and most expensive — mistake teams make. A policy trained directly on a complex task like bimanual cloth folding from scratch will fail to learn anything useful in the first 2,000 demonstrations. The task space is too large, the reward signal too sparse, and the network has no prior skills to build on.

A well-designed curriculum sequences tasks from primitive skills to complex compositions, where each stage builds on skills acquired in the previous stage. The result: policies that learn faster, generalize better, and require fewer total demonstrations. Teams using curriculum learning at SVRC have reduced total demonstration requirements by 40-60% for complex manipulation tasks compared to direct end-to-end training.

The concept borrows directly from human motor learning. You do not teach a child to juggle before they can catch. In robotics, the theory is formalized as curriculum learning (Bengio et al., 2009): training on examples ordered by increasing difficulty produces better generalization than random ordering, even with identical total training data.

Curriculum Learning Theory for Robotics

Curriculum learning in the robot manipulation context rests on three theoretical pillars, each with practical implications for how you sequence tasks and allocate demonstration budgets.

Easy-to-Hard Ordering

The foundational principle: present easier tasks first, progressively introducing harder variants. For manipulation, "easy" means fewer contact transitions, shorter horizon, less visual variation, and more constrained initial conditions. A concrete ordering for a general manipulation curriculum:

Fixed-position grasps — object always in the same pose, same lighting, same background. The policy learns the basic visuomotor mapping.
Varied-position grasps — object position randomized within a 20 cm x 20 cm region. The policy learns position invariance.
Varied-object grasps — multiple object shapes at varied positions. The policy learns shape-adaptive grasping.
Sequential manipulation — pick, transport, place. Multiple state transitions.
Contact-rich tasks — insertion, assembly, force-sensitive operations.

Each stage should achieve a minimum 90% success rate before advancing to the next. Advancing too early wastes demonstrations on a policy that lacks the foundation to learn the harder task efficiently.

Automatic Curriculum Methods

Manual curriculum design requires human judgment about task difficulty ordering. Automatic curriculum methods let the learning algorithm itself decide which tasks to train on next, based on the policy's current competence.

Method	How It Works	Best For	Limitation
Self-paced learning	Up-weight training samples where the policy's loss is low (already learned) or in a "learning zone" (moderate loss)	Large, heterogeneous demonstration datasets	Requires all data upfront; no active collection
Competence-based progression	Evaluate policy on current stage every N episodes; advance when success rate > threshold	Active data collection pipelines (SVRC default)	Requires evaluation infrastructure between stages
Hindsight relabeling	Failed demonstrations are relabeled as successes for easier goals (e.g., "reach to where the object ended up")	Goal-conditioned policies; RL fine-tuning	Not directly applicable to pure IL without RL component
Domain randomization scheduling	Gradually increase visual/physical randomization range during training	Sim-to-real transfer; robustness training	Requires simulation environment

In practice, SVRC uses competence-based progression for active data collection projects: collect demonstrations at the current curriculum stage, train, evaluate, and only advance when the success threshold is met. This prevents wasting expensive operator time collecting demonstrations the policy cannot yet learn from.

Failure Case Injection

A curriculum that only contains successful demonstrations produces a brittle policy. When the deployed policy encounters a novel situation and partially fails, it has never seen a recovery trajectory and enters an out-of-distribution state that typically cascades to complete failure.

Deliberate failure injection addresses this by including 10-20% failure-and-recovery demonstrations at each curriculum stage:

Grasp failures: Operator intentionally performs a weak grasp, detects the slip, re-grasps. The policy learns slip detection and recovery.
Positioning errors: Operator approaches the object slightly off-center, corrects mid-approach. The policy learns correction behavior.
Object perturbation: A second person moves the object during the approach. The operator re-plans. The policy learns reactive re-planning.
Partial task completion: Operator completes 70% of the task, the object drops, operator recovers and completes. The policy learns mid-task recovery.

Include failure injection starting at curriculum stage L2. At L1 (fixed primitive), failures are rare and not informative. At L2+, failure recovery is a critical skill that significantly improves production robustness.

Curriculum Design Principles

Start with primitive skills (reaching, grasping): A "reach to object" policy and a "grasp stable object" policy are the foundation for virtually every manipulation task. Train these to high competence (>95% success) before composing them into multi-step tasks.
Compose into complex tasks: Once primitive skills are trained, complex tasks are learned much faster because the policy already has the relevant perceptual and motor sub-skills. A "place cup in tray" task trained after reaching and grasping policies exist needs only 300-500 demonstrations to learn the composition; training it from scratch requires 2,000-5,000.
Reuse demonstrations across tasks: Demonstrations of primitive skills can be reused as pretraining data for all tasks that require those skills. This amortizes the cost of collecting high-quality primitive demonstrations.
Measure primitive skill quality strictly: A primitive skill that only works 80% of the time will compound badly in a composed task. Target >95% success on each primitive before advancing to composition.

Task Decomposition Strategies

Every complex manipulation task can be decomposed into a sequence of primitive actions. The decomposition determines how you structure your curriculum, allocate demonstration budgets, and compose policies. Three decomposition strategies, ranked by implementation complexity:

Strategy 1: Sequential Primitives (Simplest)

Break the task into a fixed sequence of primitives executed one after another with explicit handoff conditions.

Example — Cup stacking: REACH(cup_A) → GRASP(cup_A) → TRANSPORT(cup_A, above_cup_B) → PLACE(cup_A, on_cup_B) → RELEASE
Handoff condition: Each primitive declares a success condition (e.g., gripper closed AND gripper force > 2N for GRASP). The next primitive begins only when the previous succeeds.
Advantages: Easy to debug (failure is localized to one primitive), each primitive can be trained independently, primitives are reusable.
Limitations: Rigid sequencing cannot handle tasks where the order is context-dependent or where primitives overlap in time.

Strategy 2: Hierarchical Policy (Moderate)

A high-level policy selects which primitive to execute based on the current observation. Primitives are pre-trained; only the high-level policy is trained on the full task.

Example — Sorting task: The high-level policy observes the scene, identifies the next object to sort, and selects REACH → GRASP → PLACE_BIN_A or REACH → GRASP → PLACE_BIN_B.
Training: Pre-train all primitives independently (curriculum stages L1-L2). Then train the high-level policy with demonstrations of the full sorting task, where the high-level actions are primitive selections, not raw joint commands.
Advantages: Handles variable-length tasks and conditional branching. High-level policy requires far fewer demonstrations (50-200) than training end-to-end.

Strategy 3: End-to-End with Curriculum Pretraining (Most Flexible)

Train a single end-to-end policy for the full task, but initialize the visual encoder and action decoder weights from primitive skill training.

Process: (1) Train primitives, extracting the visual encoder weights. (2) Initialize the full-task policy with the primitive encoder. (3) Fine-tune on the full task's demonstrations.
Advantages: No explicit handoff conditions; the policy learns implicit transitions. Most natural motion quality.
Limitations: Harder to debug when failures occur. Requires more full-task demonstrations than Strategy 2.

Task Difficulty Levels L1-L5

Level	Task Description	Examples	Demo Requirement	Key Challenge
L1 — Primitive single-step	Top grasp of flat, stationary object in fixed position	Pick flat block, open/close gripper on fixed peg	50-200	Precise approach trajectory
L2 — Varied grasp	Grasp objects with varied size, position, or orientation	Pick cylinder of varying diameter, grasp in +/-30 deg orientation range	500-1,000	Visual generalization
L3 — Two-step manipulation	Sequential actions with state dependency	Pick and place, open box lid then insert item	1,000-5,000	State estimation between steps
L4 — Contact-rich assembly	Precise insertion, assembly under uncertainty	USB plug insertion, snap-fit assembly, nut threading	5,000-20,000	Reactive force control
L5 — Bimanual deformable	Two arms, deformable objects, long horizons	Fold towel, bag groceries, cut with knife and fork	20,000+	Bimanual coordination, deformable state

These ranges assume modern transformer-based imitation learning architectures (ACT, Diffusion Policy). Older behavior cloning methods require 2-5x more demonstrations for the same task. If you are using a newer architecture like pi-0 or a large pre-trained vision-language-action model, the demonstration requirements at L1-L3 may be 3-10x lower due to pre-trained priors.

Demonstration Count Scaling Per Curriculum Stage

Demonstration budgets should follow a non-linear scaling pattern across curriculum stages. The first stage requires relatively few demonstrations, but each subsequent stage requires proportionally more due to increased task complexity.

Curriculum Stage	Demos (Without Curriculum)	Demos (With Curriculum)	Savings	Success Threshold to Advance
L1 — Reach + grasp (fixed)	100-200	100-200	0% (baseline)	95%
L2 — Varied grasp	500-1,000	300-600	30-40%	90%
L3 — Pick-place sequence	2,000-5,000	800-2,000	50-60%	85%
L4 — Contact assembly	10,000-20,000	4,000-8,000	50-60%	80%
L5 — Bimanual deformable	20,000-50,000	8,000-20,000	50-60%	75%

Key insight: the percentage savings from curriculum are highest at L3-L5, exactly where demonstrations are most expensive. At $25-$80 per demonstration (SVRC data services pricing), the curriculum approach saves $50K-$200K on a typical L4 task compared to direct training.

Curriculum for Manipulation: Pick, Stack, Insert

This worked example shows a complete curriculum for a production assembly task: inserting a component into a housing. The final task is L4 (contact-rich insertion), but the curriculum builds up through L1-L3 stages first.

Stage 1: Reach to Component (L1) -- 150 demos

Train the arm to move from home position to a pre-grasp pose 3 cm above the component. Fixed component position, fixed lighting. Success: end-effector within 5 mm of target pre-grasp pose.

Hardware: OpenArm 101 (6-DOF, 500g payload) with wrist-mounted RealSense D405
Data rate: 30 Hz joint positions + 60 fps RGB from wrist camera
Output format: HDF5 with /observations/qpos [6], /observations/images/cam_wrist [480x640x3], /action [6]
Evaluation: 20 trials, target >95% success

Stage 2: Grasp Component (L2) -- 400 demos

Extend from pre-grasp to closed grasp. Component position varied within 15 cm x 15 cm. Three component sizes (small, medium, large). Success: stable grasp verified by lifting 5 cm without drop.

Initialize visual encoder from Stage 1 weights
Include 10% grasp-failure-and-regrip demonstrations
Evaluation: 50 trials across all variants, target >90% success

Stage 3: Transport and Align (L3) -- 800 demos

Pick component, transport to housing, align above insertion point. Success: component aligned within 2 mm and 3 degrees of insertion axis.

Initialize from Stage 2 weights (encoder + action decoder)
Vary housing position within 10 cm x 10 cm
Include 15% off-center-approach-and-correct demonstrations
Evaluation: 50 trials, target >85% success

Stage 4: Insert Component (L4) -- 2,000 demos

Complete insertion with force regulation. Add F/T sensor data to observation space. Success: component fully seated (F/T sensor detects seating force signature).

Initialize from Stage 3 weights; add F/T observation branch to the network
Collect with wrist F/T sensor at 100 Hz alongside vision at 30 Hz
Include 20% recovery demonstrations: misaligned approach, correct via force feedback
Evaluation: 100 trials, target >80% success

Total with curriculum: 3,350 demonstrations. Total without curriculum: 10,000-20,000 demonstrations. Cost savings at $40/demo: $266K-$666K saved.

Evaluation Metrics at Each Stage

Each curriculum stage requires specific evaluation metrics beyond simple success/failure. These metrics diagnose whether the policy is ready to advance or needs more data.

Metric	What It Measures	Target	Stage Applicability
Success rate	Binary task completion	75-95% (by stage)	All stages
Trajectory smoothness (jerk)	Third derivative of joint positions; lower = smoother	<2x demonstration average	L1-L3
Positional accuracy	End-effector error at target pose	<5 mm for L1-L2, <2 mm for L3-L4	L1-L4
Episode duration variance	Consistency of task execution timing	CV < 0.3	L2-L5
Force regulation error	Deviation from target contact force	+/- 1 N of target	L4-L5 only
Bimanual sync error	Timing offset between left and right arm actions	<50 ms	L5 only
Generalization gap	Success rate on held-out object positions vs. training positions	<10% drop	L2-L5

Upload all evaluation data to the SVRC data platform for automated metric computation and trend tracking across curriculum stages.

Skill Decomposition Example

Consider a cup stacking task (L3): pick up a cup and stack it on a second cup. This task decomposes into three reusable skills:

Reach skill: Move end-effector to within 2 cm of the cup handle area, with correct approach angle. Trained independently with 100-200 demos. Reusable for any cup-grasping task.
Grasp skill: Close gripper on cup from reach pose, verify grasp success via gripper position feedback. Trained with 50-100 demos (initialized from reach skill weights). Reusable for all cup handling tasks.
Place skill: Lower cup onto target cup, align centers, release gripper. The novel skill in cup stacking. Requires 300-500 demos given reach+grasp pretraining. Without pretraining: 1,500-2,500 demos.

Total with curriculum: 450-800 demonstrations. Total without curriculum: 2,000-5,000 demonstrations. The curriculum reduces data requirements by 4-6x for this task.

Transfer Learning Between Tasks

Modern policy architectures share a visual encoder across the observation space. This encoder learns to extract task-relevant features from images. When you train the encoder on a variety of tasks through curriculum, it develops richer representations that transfer more effectively to novel tasks.

Shared visual encoder: Train a single ResNet-18 or ViT-small encoder on all curriculum tasks jointly. This encoder learns features like "object edge," "gripper proximity," and "grasp contact" that are useful across many tasks.
3x sample efficiency with curriculum: Empirically, a curriculum-pretrained visual encoder reduces the demonstration requirement for a novel L3 task from 2,000 demos to approximately 600-700 demos — roughly a 3x improvement in sample efficiency.
Fine-tuning strategy: For a new task, freeze the encoder weights from the curriculum (or use a low learning rate), and train only the action decoder on the new task's demonstrations. This prevents catastrophic forgetting of previous skills while learning the new task.

When to Skip Curriculum

Curriculum design takes time and only pays off for complex tasks. Not every project warrants it:

L1-L2 tasks: go direct. Simple single-step or varied-grasp tasks can be trained end-to-end in 100-1,000 demos. The overhead of designing a curriculum exceeds the savings.
L3 tasks: situation-dependent. If you have a clear skill decomposition and already have primitive skill data from another project, use curriculum. If starting from scratch for a single L3 task, direct training is often faster.
L4-L5 tasks: curriculum is essential. Attempting to train contact-rich assembly or bimanual tasks directly from scratch without curriculum is expensive and usually unsuccessful. At this level, the curriculum design effort is well-justified.
Using pre-trained VLAs: If you are fine-tuning a pre-trained vision-language-action model (RT-2, Octo, pi-0), the model already has significant prior knowledge. Curriculum may still help but the baseline performance is much higher, reducing the gap.

Data Collection Order

Collect data in curriculum order — do not collect all task data before beginning training. This allows you to use earlier task policies as a starting point for collecting harder task demonstrations:

Collect L1 data first: Train L1 policies to >95% success. These policies can now autonomously collect approach trajectories for L2/L3 data collection (operator corrects near contact, not full approach).
Use trained policies for data augmentation: A trained reaching policy can autonomously execute the approach phase while a human operator teleoperation from the grasp phase onward. This reduces operator cognitive load and produces more consistent demonstrations.
Prioritize consistency over speed in early stages: L1 and L2 demonstrations should be executed slowly and deliberately. The policy will learn the approximate speed from the data — slow, clear demonstrations are more informative than fast ones that are harder to decompose.
Interleave training and collection: After every 100-200 demonstrations, train a checkpoint and evaluate. If the policy is plateauing, collect more diverse demonstrations (new object positions, lighting conditions) rather than simply more of the same.

Curriculum Design Checklist

Define the final target task and its success criteria precisely
Decompose into 3-5 curriculum stages from L1 to target level
Specify demonstration count budget per stage (use the scaling table above)
Define evaluation metrics and advancement thresholds for each stage
Plan failure injection at 10-20% for stages L2 and above
Set up the data platform for per-stage tracking before collection begins
Allocate 20% of budget as reserve for stages that need more demonstrations than planned
Plan encoder weight transfer strategy (freeze vs. fine-tune) between stages
Schedule operator training on task-specific requirements before each stage
Establish a review cadence: evaluate checkpoint every 200 demos, not just at stage end

Related Guides

Robot Data Collection Service Buyer's Guide — evaluating data providers for curriculum-scale projects
Data Formats Guide (HDF5/RLDS/LeRobot) — ensuring curriculum data is stored in compatible formats
Bimanual Teleoperation Hardware Setup — hardware for L5 curriculum tasks
Operator Recruitment and Training — building the team to execute your curriculum
Force/Torque Sensor Selection Guide — adding F/T sensing for L4+ curriculum stages
How to Set Up a Teleoperation Lab — lab infrastructure for data collection

Work with SVRC

SVRC provides end-to-end curriculum design and execution for robot learning projects. Our team has designed and executed curricula for manipulation tasks ranging from L1 single-step grasps through L5 bimanual deformable object manipulation.

Curriculum consulting: We review your target task, propose a curriculum structure, and estimate demonstration budgets before any data collection begins.
Data collection execution: Our certified operators at Mountain View, CA and Allston, MA facilities collect curriculum-stage data on OpenArm 101, DK1 bimanual, Unitree G1, and 5+ additional arm platforms.
Platform integration: All collected data is uploaded to the SVRC data platform with per-stage metrics, evaluation dashboards, and format export to HDF5/LeRobot/RLDS.
Hardware rental: Need hardware for in-house curriculum execution? Lease robots from SVRC for daily, weekly, or monthly terms.

Build Your Training Dataset

SVRC Data Services provides curriculum-aware data collection infrastructure, operator training, and dataset quality analysis for robot learning teams.

Explore Data Services Talk to an Expert