1Z0-1111-25 Syllabus — Learning Objectives by Topic

Learning objectives for OCI 2025 Observability Professional (1Z0-1111-25), organized by topic with quick links to targeted practice.

Use this syllabus as your source of truth for 1Z0‑1111‑25.

What’s covered

Topic 1: Observability Foundations and Signal Strategy
Topic 2: OCI Monitoring (Metrics) and Alarm Management
Topic 3: OCI Logging and Log Analytics
Topic 4: APM, Tracing, and Performance Investigation
Topic 5: Telemetry Routing, Automation, and Integration Patterns
Topic 6: Incident Response, Compliance, and Optimization

Topic 1: Observability Foundations and Signal Strategy

Practice this topic →

1.1 Observability pillars and SLO thinking

Define observability vs monitoring and explain why observability reduces time-to-detect and time-to-recover.
Differentiate metrics, logs, and traces and choose the primary signal for common incident scenarios.
Explain SLIs and SLOs and why alerting should align with user impact rather than internal noise.
Given a scenario, choose leading indicators vs lagging indicators for early detection.
Recognize common anti-patterns: alerting on every metric, dashboards without questions, and alerts without owners.
Explain how environment separation and tagging improve observability governance and incident triage.

1.2 Instrumentation and telemetry collection principles

Describe how instrumentation choices affect cardinality, cost, and diagnostic usefulness (concept-level).
Given a scenario, decide what to sample vs capture at full fidelity (concept-level).
Explain why structured logs improve search, parsing, and correlation.
Identify how correlation IDs propagate across services to connect logs and traces (concept-level).
Given a scenario, choose where to collect telemetry (agent, SDK, platform logs) conceptually.
Recognize privacy and security concerns: PII in logs, tenancy boundaries, and retention controls.

1.3 Dashboards, runbooks, and operational readiness

Design dashboards that separate high-level health from deep-dive diagnostics and reduce time-to-diagnose.
Given a scenario, create an incident dashboard that surfaces key service health indicators and dependencies.
Explain why runbooks and annotations reduce incident duration and improve reliability.
Identify what to document: known failure modes, mitigation steps, verification checks, and rollback procedures.
Given a scenario, choose alert routing and ownership patterns that reduce time-to-acknowledge.
Recognize post-incident practices: blameless postmortems, action items, and alert reviews.

Topic 2: OCI Monitoring (Metrics) and Alarm Management

Practice this topic →

2.1 Metrics model and namespaces

Explain OCI Monitoring metric concepts: namespaces, dimensions, intervals, and statistics (concept-level).
Given a scenario, choose the correct metric and dimension to isolate a resource, fleet, or compartment scope.
Recognize the risk of high-cardinality dimensions and how to constrain them for cost and usability.
Explain the difference between built-in service metrics and custom/application metrics (concept-level).
Given a scenario, design a naming standard for custom metrics and consistent dimensions.
Identify how tags and compartments affect monitoring views, permissions, and reporting.

2.2 Alarms, notifications, and routing

Create alarms with appropriate thresholds, evaluation windows, and suppression to avoid flapping.
Given a scenario, choose Notifications topics and subscriptions for alert delivery and escalation.
Explain routing patterns for alerts (email, SMS, webhooks) and how to integrate with incident systems (concept-level).
Recognize when composite alarms or multi-condition alarms reduce noise (concept-level).
Given a scenario, design escalation paths and maintenance window behavior conceptually.
Identify troubleshooting steps when alarms are missing or delayed (permissions, metric gaps, mis-scoping).

2.3 Dashboards and custom monitoring

Build dashboards with charts that answer specific questions (availability, latency, saturation) rather than generic telemetry.
Given a scenario, choose appropriate aggregation (sum, average, percentiles) for the metric and question.
Explain how metric math/rate calculations support meaningful alerts (concept-level).
Identify ways to publish custom metrics from applications, jobs, and automation (concept-level).
Given a scenario, design monitoring for ephemeral/serverless workloads (functions, jobs) conceptually.
Recognize cost considerations for high-resolution metrics, many alarms, and noisy dashboards.

Topic 3: OCI Logging and Log Analytics

Practice this topic →

3.1 Log sources, log groups, and ingestion

Identify OCI log sources: service logs, audit logs, and custom/application logs (concept-level).
Given a scenario, design log group structure and retention by environment and sensitivity.
Explain onboarding patterns for compute and application logs (agent vs service logs) conceptually.
Recognize the need for consistent timestamps and fields to enable correlation across telemetry sources.
Given a scenario, choose masking/exclusion strategies to prevent sensitive data leakage in logs.
Identify common ingestion failures and remediation steps (permissions, agent config, networking).

3.2 Search, parsing, and correlation (Log Analytics)

Explain how Log Analytics supports search, parsing, and correlation across log sources (concept-level).
Given a scenario, write filters/queries to isolate errors, outliers, or suspicious activity (concept-level).
Identify the purpose of parsing rules and how structured logs reduce parsing complexity.
Given a scenario, create log-based metrics and alerts (concept-level).
Recognize approaches to correlate logs with metrics and traces using IDs and timestamps (concept-level).
Explain how retention and storage choices affect log analytics cost and compliance.

3.3 Audit, security, and governance for logs

Describe OCI Audit log purpose and identify high-value events to monitor (IAM/network changes, privileged actions).
Given a scenario, implement access controls that prevent unauthorized log access or tampering (concept-level).
Recognize chain-of-custody requirements for investigations and long-term evidence handling (concept-level).
Explain how to export logs to Object Storage or external analysis destinations (concept-level).
Given a scenario, design log retention and deletion policies aligned to compliance requirements.
Identify log governance anti-patterns: everyone can read, no retention, and logging secrets or credentials.

Topic 4: APM, Tracing, and Performance Investigation

Practice this topic →

4.1 Tracing fundamentals and service maps

Define traces, spans, and context propagation and explain how tracing reveals latency and dependency structure.
Given a scenario, choose where to instrument to capture the critical path (frontend/API/database).
Explain sampling strategies and trade-offs for high-traffic services (concept-level).
Recognize common tracing pitfalls: missing context, async boundaries, and sampling bias (concept-level).
Given a scenario, use service maps to identify upstream/downstream dependencies conceptually.
Identify how correlation IDs join traces with logs and metrics for faster diagnosis (concept-level).

4.2 OCI Application Performance Monitoring (APM) concepts

Explain APM capabilities: traces, service maps, errors, and performance signals (concept-level).
Given a scenario, choose APM for troubleshooting latency vs relying on metrics and logs alone.
Recognize how APM can identify slow database calls, external dependency latency, and error hotspots (concept-level).
Explain how to define alert conditions for APM signals (errors, latency) conceptually.
Given a scenario, segment performance by dimension (service, endpoint, region) conceptually.
Identify how to troubleshoot instrumentation issues (missing traces, agent misconfiguration) conceptually.

4.3 Performance investigation workflows

Given a scenario, follow a systematic workflow: reproduce → isolate → measure → fix → verify.
Explain the difference between symptom metrics (CPU) and cause metrics (queue backlogs) conceptually.
Recognize common bottlenecks: saturation, contention, slow I/O, and mis-sized instances.
Given a scenario, choose load testing and synthetic checks to validate performance improvements (concept-level).
Identify how to use annotations and change tracking to link regressions to deployments or config changes.
Explain why profiling and tracing should be used safely in production due to overhead (concept-level).

Topic 5: Telemetry Routing, Automation, and Integration Patterns

Practice this topic →

5.1 Service Connector Hub and telemetry pipelines

Explain Service Connector Hub purpose: move telemetry between OCI services reliably (concept-level).
Given a scenario, route logs to Log Analytics or Object Storage using Service Connector Hub.
Recognize transformation, filtering, and batching needs in telemetry pipelines (concept-level).
Explain failure handling: retries, dead-letter patterns, and monitoring connectors conceptually.
Given a scenario, design least-privilege policies for connectors and their destinations.
Identify cost and latency trade-offs when exporting high-volume telemetry.

5.2 Notifications, events, and automated responses

Differentiate Monitoring alarms, Events, and Notifications and explain how they compose into workflows.
Given a scenario, trigger automation (Functions/webhooks) when alarms fire to create tickets or remediate (concept-level).
Explain how to design safe responders that avoid runaway automation or unsafe changes.
Recognize when automation must be gated with human approval for risky actions (concept-level).
Given a scenario, design alert deduplication and suppression during planned maintenance windows.
Identify patterns to integrate with external incident systems via HTTPS endpoints and webhooks (concept-level).

5.3 Cross-service correlation and multi-scope views

Explain how compartments, tags, and resource identifiers enable correlation across telemetry sources.
Given a scenario, design dashboards that aggregate across compartments or regions while preserving clarity.
Recognize challenges of multi-tenancy/multi-account views: access boundaries, naming, and data separation (concept-level).
Explain how to normalize telemetry fields (service name, env, region) to support correlation at scale.
Given a scenario, design centralized observability while preserving least privilege and privacy controls.
Identify governance practices: standard log formats, metric naming conventions, and alert ownership.

Topic 6: Incident Response, Compliance, and Optimization

Practice this topic →

6.1 Incident triage and troubleshooting scenarios

Given a scenario, triage an outage using metrics first, then logs and traces to isolate the likely cause.
Explain how to distinguish availability incidents from performance incidents using observable signals.
Recognize common failure modes: DNS misconfiguration, routing blocks, expired certificates, and quota limits (concept-level).
Given a scenario, choose remediation steps that reduce blast radius (feature flags, rollback, throttling).
Identify how to communicate incident status, scope, and timelines effectively during an active event.
Explain how to verify recovery with evidence (health checks, normalized metrics) after remediation.

6.2 Compliance, retention, and privacy operations

Define retention requirements and how they affect telemetry storage and access controls.
Given a scenario, implement least-privilege access for observability data and separate admin vs viewer roles.
Recognize how to handle sensitive data in telemetry (masking, sampling, exclusion) conceptually.
Explain auditability requirements for who changed alerts/dashboards and when (concept-level).
Given a scenario, design evidence collection for audits using exported logs, dashboards, and reports.
Identify pitfalls: collecting more data than needed, unclear ownership, and unbounded retention.

6.3 Cost and scaling considerations for observability

Identify major cost drivers: high-cardinality metrics, large log volumes, and long retention windows.
Given a scenario, reduce observability cost without losing critical signals (sampling, aggregation, better log levels).
Explain how to set appropriate log levels and use structured logging to reduce noise and improve queryability.
Recognize when to archive telemetry to Object Storage for long-term retention (concept-level).
Given a scenario, design tiered alerting and dashboards that scale with organization growth.
Explain how continuous improvement (postmortems, alert reviews) keeps observability sustainable.

Study Plan

Cheat Sheet

Browse Exams — Mock Exams & Practice Tests

1Z0-1111-25 Syllabus — Learning Objectives by Topic

What’s covered

Topic 1: Observability Foundations and Signal Strategy

1.1 Observability pillars and SLO thinking

1.2 Instrumentation and telemetry collection principles

1.3 Dashboards, runbooks, and operational readiness

Topic 2: OCI Monitoring (Metrics) and Alarm Management

2.1 Metrics model and namespaces

2.2 Alarms, notifications, and routing

2.3 Dashboards and custom monitoring

Topic 3: OCI Logging and Log Analytics

3.1 Log sources, log groups, and ingestion

3.2 Search, parsing, and correlation (Log Analytics)

3.3 Audit, security, and governance for logs

Topic 4: APM, Tracing, and Performance Investigation

4.1 Tracing fundamentals and service maps

4.2 OCI Application Performance Monitoring (APM) concepts

4.3 Performance investigation workflows

Topic 5: Telemetry Routing, Automation, and Integration Patterns

5.1 Service Connector Hub and telemetry pipelines

5.2 Notifications, events, and automated responses

5.3 Cross-service correlation and multi-scope views

Topic 6: Incident Response, Compliance, and Optimization

6.1 Incident triage and troubleshooting scenarios

6.2 Compliance, retention, and privacy operations

6.3 Cost and scaling considerations for observability