System Design: Performance Review System

Requirements

Functional Requirements:

Employees set quarterly/annual goals with measurable key results (OKR framework)
Managers and peers submit structured performance reviews during review cycles (biannual or annual)
360-degree feedback: self-review, manager review, peer reviews (3-5 nominated peers), and optional upward reviews
Calibration workflow where department leaders normalize ratings across teams to eliminate bias
Historical performance data with trend visualization across review cycles
Integration with compensation systems to link performance ratings to merit increases

Non-Functional Requirements:

Support 5,000 enterprise customers with up to 500K employees each; total 50M employee profiles
Handle review cycle surge: 80% of an organization's reviews submitted within a 2-week window
99.9% availability during review submission windows (business-critical deadlines)
Strong consistency for review submissions and calibration decisions
Data encryption and strict access controls (reviews visible only to authorized parties)

Scale Estimation

50M employee profiles across 5,000 enterprises. Review cycles: average 2 per year per employee = 100M reviews/year. Each review includes self, manager, and 4 peer reviews = 600M review documents/year. Concentrated during 2-week submission windows: 600M / 26 weeks × 80% in 2 weeks = ~185M submissions in peak 2-week window = 150 submissions/sec sustained during peak. Goal updates: 50M × 4 quarterly updates = 200M goal events/year. Each review document averages 2KB (structured scores + text feedback) → 1.2TB/year of review data. Calibration sessions: 100K departments × 2 sessions/year = 200K calibration events.

High-Level Architecture

The system follows an event-driven architecture built on a multi-tenant SaaS platform. The Review Cycle Engine is the orchestrator: HR administrators configure a review cycle (dates, review types, participant rules, rating scales) and launch it. The engine generates review tasks for all participants based on organizational hierarchy and peer nomination rules. Each task has a deadline and reminder schedule. The engine tracks completion percentages and sends escalation notifications to managers for incomplete reviews.

The Feedback Collection Service handles review form rendering and submission. Review forms are configurable per organization: structured rubrics (1-5 rating scales on competencies like "Technical Skills", "Leadership", "Communication") plus free-text sections ("Strengths", "Areas for Growth", "Key Accomplishments"). Form configurations are stored as JSON schemas, enabling each customer to customize without code changes. Submissions are validated against the schema, encrypted, and stored in PostgreSQL.

The Calibration Service supports collaborative rating normalization. Department leaders view a scatter plot or 9-box grid of their team's ratings, discuss outliers, and adjust ratings to ensure consistency across managers. Calibration sessions use real-time collaboration (WebSocket-based) where multiple leaders can view and discuss changes simultaneously. The session state is persisted after consensus, and adjusted ratings flow back to individual reviews.

Core Components

Review Cycle Engine

The engine models each review cycle as a state machine with phases: Configuration → Nomination → Submission → Calibration → Release. Each phase has entry/exit criteria (e.g., Submission cannot start until Nomination is 90% complete). The engine runs as a scheduled job (every 5 minutes) that evaluates all active cycles, sends reminders for approaching deadlines, and transitions phases when criteria are met. Task generation uses the organizational hierarchy graph: for each employee, the engine creates tasks for self-review, manager-review, and peer-reviews (nominated peers approved by the manager). The engine handles edge cases: employee transfers mid-cycle, manager changes, leaves of absence.

Calibration Workflow

Calibration is modeled as a hierarchical process: team-level calibration (manager + skip-level) → department-level (director + managers) → organization-level (VP + directors). At each level, leaders see aggregated ratings for their scope, identify outliers, and discuss adjustments. The system provides statistical aids: distribution charts showing rating spread vs expected distribution (e.g., forced ranking curves), comparisons across teams, and historical trends. Calibration changes are tracked with full audit trails (original rating, adjusted rating, adjusted_by, reason). A locking mechanism prevents further changes to calibrated ratings unless a senior leader explicitly unlocks them.

Access Control & Privacy

Performance reviews contain highly sensitive data requiring fine-grained access controls. The system implements a hierarchical permission model: employees see their own reviews (after release), managers see their direct reports' reviews, skip-level managers see aggregated data for their organization, and HR administrators have full access for compliance purposes. Peer review content is anonymized by default (the reviewer's identity is hidden from the reviewee). All review data is encrypted at rest with per-tenant keys and in transit with TLS 1.3. Access to review data is logged in an immutable audit trail.

Database Design

Core tables in PostgreSQL with RLS: review_cycles (cycle_id, tenant_id, name, start_date, end_date, phases JSONB, status), review_tasks (task_id, cycle_id, reviewer_id, reviewee_id, review_type ENUM(self, manager, peer, upward), status, deadline, submitted_at), review_responses (response_id, task_id, ratings JSONB, feedback_text_encrypted, submitted_at, calibrated_rating nullable, calibrated_by nullable). Goals: goals (goal_id, employee_id, tenant_id, title, description, key_results JSONB, status ENUM(draft, active, completed, deferred), quarter, year, progress_pct).

Indexes: (tenant_id, cycle_id, status) for dashboard queries, (reviewer_id, cycle_id) for "my pending reviews", (reviewee_id, cycle_id) for individual review summaries. The organizational hierarchy is stored as an adjacency list: org_relationships (employee_id, manager_id, effective_date, end_date) enabling point-in-time hierarchy reconstruction (important when a manager changes mid-cycle). Calibration session state uses a separate table with JSONB snapshots of the rating grid at each save point.

API Design

POST /api/v1/cycles — Create a review cycle; body contains dates, review types, rating schema; returns cycle_id
GET /api/v1/tasks?reviewer_id={id}&cycle_id={id}&status=pending — Fetch pending review tasks for a reviewer
POST /api/v1/tasks/{task_id}/submit — Submit a review; body contains ratings JSONB and feedback text; returns response_id
PUT /api/v1/calibration/{cycle_id}/adjustments — Submit calibration rating adjustments; body contains [{employee_id, adjusted_rating, reason}]

Scaling & Bottlenecks

The review submission surge is the primary scaling challenge. During a 2-week window, 80% of an organization's employees submit reviews, creating a 10x traffic spike compared to baseline. The system pre-provisions database connections and application server capacity 1 week before known cycle deadlines. Write-heavy workload during submission is handled by batching review submissions in an in-memory queue and flushing to PostgreSQL in bulk every 500ms, reducing database round-trips by 50x. Read replicas serve dashboard queries (completion rates, aggregate statistics) to offload the primary.

Calibration sessions with 50+ participants create WebSocket fan-out challenges. Each change to a rating must be broadcast to all session participants in real-time. The system uses a Redis Pub/Sub channel per calibration session; each WebSocket server subscribes to relevant channels. For large sessions, rate limiting UI updates to 2 per second (batching intermediate changes) prevents visual noise and reduces WebSocket traffic.

Key Trade-offs

Configurable review forms (JSON schema) vs fixed templates: JSON schemas allow each enterprise to customize rating scales, competencies, and feedback sections without code changes, but increase UI rendering complexity and make cross-customer analytics harder
Hierarchical calibration vs flat calibration: Multiple calibration levels (team → department → org) produce fairer ratings but extend the review cycle timeline by 2-3 weeks — most enterprises accept this trade-off for rating consistency
Anonymized peer feedback vs attributed: Anonymity encourages honest feedback but prevents follow-up conversations — the system lets each organization configure their policy, with anonymity as the default
Surge-provisioned infrastructure vs always-on capacity: Pre-provisioning for review cycles saves 60% on infrastructure costs during the 48 non-surge weeks but requires capacity planning coordination with customer success teams