Skip to content

Test Parallelization GuideΒΆ

OverviewΒΆ

The test suite can run tests in parallel using test sharding - splitting tests across multiple isolated Docker stacks. This significantly speeds up test execution while maintaining isolation.

Quick StartΒΆ

# Run with 5 shards (default)
yarn test:sharded

# Run with 3 shards (faster startup, less parallelization)
yarn test:sharded:3

# Run with 10 shards (more parallelization, more resource usage)
yarn test:sharded:10

# Custom shard count
./scripts/test-e2e-sharded.sh 8

How It WorksΒΆ

ArchitectureΒΆ

  1. Multiple Docker Stacks: Each shard runs in its own isolated Docker Compose stack
  2. Different port: 3069, 3070, 3071, etc.
  3. Different database: matchzy_tournament_shard1, matchzy_tournament_shard2, etc.
  4. Different Docker project: matchzy-test-shard-1, matchzy-test-shard-2, etc.

  5. Playwright Sharding: Uses Playwright's built-in --shard=X/Y feature

  6. Splits all tests into Y shards
  7. Shard X runs its subset of tests
  8. Tests are distributed deterministically

  9. Parallel Execution: All shards run simultaneously

  10. Each shard uses 1 worker (no internal parallelism)
  11. But N shards run in parallel = Nx speedup

Example: 5 ShardsΒΆ

Shard 1: Tests 1-50   β†’ Docker stack on port 3069 β†’ Database: matchzy_tournament_shard1
Shard 2: Tests 51-100 β†’ Docker stack on port 3070 β†’ Database: matchzy_tournament_shard2
Shard 3: Tests 101-150 β†’ Docker stack on port 3071 β†’ Database: matchzy_tournament_shard3
Shard 4: Tests 151-200 β†’ Docker stack on port 3072 β†’ Database: matchzy_tournament_shard4
Shard 5: Tests 201-249 β†’ Docker stack on port 3073 β†’ Database: matchzy_tournament_shard5

All 5 shards run simultaneously in parallel processes.

PerformanceΒΆ

Before (Single Worker)ΒΆ

  • 249 tests Γ— 3 browsers = 747 test runs
  • Sequential execution: ~18-20 minutes
  • Resource usage: Low (1 Docker stack)

After (5 Shards)ΒΆ

  • 249 tests split into 5 shards Γ— 3 browsers
  • Parallel execution: ~4-5 minutes (4x speedup)
  • Resource usage: Medium (5 Docker stacks)

After (10 Shards)ΒΆ

  • 249 tests split into 10 shards Γ— 3 browsers
  • Parallel execution: ~2-3 minutes (8-10x speedup)
  • Resource usage: High (10 Docker stacks)

Resource RequirementsΒΆ

Each shard requires: - ~500MB RAM (PostgreSQL + Application) - 1 CPU core - Port number (3069, 3070, etc.)

Recommended shard counts: - 3 shards: Good balance, ~6-8 minutes - 5 shards: Default, ~4-5 minutes - 10 shards: Fast but high resource usage, ~2-3 minutes

System requirements: - 8GB+ RAM recommended for 5 shards - 16GB+ RAM recommended for 10 shards - Ensure Docker has enough resources allocated

UsageΒΆ

Basic UsageΒΆ

# Default (5 shards)
yarn test:sharded

# With filters (filters apply to all shards)
yarn test:sharded --grep "@api"
yarn test:sharded --project chromium

Advanced UsageΒΆ

# Custom shard count with arguments
./scripts/test-e2e-sharded.sh 8 --grep "@ui" --project chromium

# Pass through any Playwright arguments
./scripts/test-e2e-sharded.sh 5 --reporter=dot --timeout=60000

OutputΒΆ

Individual Shard LogsΒΆ

Each shard outputs to: test-output-shard-{N}.log

Individual ReportsΒΆ

Each shard generates: playwright-report-shard-{N}/

Merged Report (if available)ΒΆ

If @playwright/merge-reports is installed: - Merged report: playwright-report/index.html - Combines all shard results

To install merge-reports:

yarn add -D @playwright/merge-reports

TroubleshootingΒΆ

Port ConflictsΒΆ

If you see port conflicts:

# Check what's using the ports
lsof -i :3069-3080

# Kill existing test containers
docker ps | grep matchzy-test-shard | awk '{print $1}' | xargs docker kill

Out of MemoryΒΆ

If Docker runs out of memory: - Reduce shard count: yarn test:sharded:3 - Increase Docker Desktop memory limit - Close other applications

Shard Startup FailuresΒΆ

If a shard fails to start: - Check logs: docker compose -f docker/docker-compose.local.yml -p matchzy-test-shard-N logs - Verify port is available - Check Docker resources

Comparison: Sharding vs WorkersΒΆ

Why Not Multiple Workers?ΒΆ

The system uses a single shared database and can't handle concurrent test operations. Multiple workers would cause: - Database conflicts (duplicate keys, race conditions) - Test interference (tests modifying shared state) - Flaky tests

Why Sharding WorksΒΆ

Each shard has: - βœ… Isolated database - βœ… Isolated application instance - βœ… No shared state - βœ… Deterministic test distribution

CI/CD IntegrationΒΆ

For CI systems (GitHub Actions, etc.), you can use native sharding:

# GitHub Actions example
strategy:
  matrix:
    shard: [1, 2, 3, 4, 5]
steps:
  - run: yarn test:sharded:${{ matrix.shard }}/5

Or use CI's parallel job feature with the sharded script.

Future ImprovementsΒΆ

Potential optimizations: - [ ] Cache Docker images between shards - [ ] Reuse build artifacts - [ ] Smart test distribution (group slow tests separately) - [ ] Automatic shard count based on available resources