As your applications and their data requirements change, the structure of your Kafka messages also needs to adapt. Effective schema lifecycle management is crucial for handling these changes smoothly and maintaining data integrity. This process involves not just changing schemas, but also systematically controlling the kinds of changes that are safe, or sufficiently compatible, for the applications that depend on them.
Managed Service for Apache Kafka schema registry supports the full lifecycle of schema management and includes the following features:
Define and enforce compatibility rules (compatibility type) to manage schema evolution when new schema versions are introduced. These rules ensure that producers and consumers continue to operate correctly.
Configure operational controls (schema modes) to manage the mutability of schemas at different levels, safeguarding your data processing pipelines.
Manage schema references to promote reusability and consistency across your schemas.
How schema evolution works
You modify your schema definition. For example—add an optional field to your
.proto
or.avsc
file.A producer configured with
auto.register.schemas=true
sends a message using the new schema, or you explicitly attempt to register the new schema using the API or client libraries.When a registration request for a new version reaches the schema registry, it retrieves the configured compatibility rule for the target subject. It compares the proposed new schema against the required previous version(s) according to that rule.
If the schema version is compatible, the new schema is successfully registered as the next version under the subject, assigned a new version number, and potentially a new
schema_id
if the definition is unique.The producer (if applicable) receives the
schema_id
to include with messages.If the schema version is incompatible, the registration attempt fails, and an error is returned.
About compatibility type
Schema compatibility lets you define how the schema registry handles compatibility checks between different schema versions. You can apply these configurations at various levels within the schema registry hierarchy, as indicated by the following resource pattern options:
Registry-level: Sets default configuration for the entire schema registry.
- Path:
projects/project/locations/location/schemaRegistries/schema_registry/config
- Path:
Subject-level within default context: Sets specific configuration for a subject within the registry's default context.
- Path:
projects/project/locations/location/schemaRegistries/schema_registry/config/subject
- Path:
Subject-level within a specific context: Sets specific configuration for a subject within a named context.
- Path:
projects/project/locations/location/schemaRegistries/schema_registry/contexts/context/config/subject
- Path:
Configurations set at subject level override those set at the registry level.
If a setting is not specified at the subject level, it inherits the value from
the registry level. If not explicitly set at the registry level,
the default is Backward
.
The following available types determine how the schema registry compares a new schema version against previous ones:
None
: No compatibility checks are performed. Allows any change, but carries a high risk of breaking clients.Backward
(Default): Consumer applications using the new schema can decode data produced with only the previously registered schema. This allows adding optional fields and deleting fields. Consumers must be upgraded before producers.Backward_transitive
: Consumer applications using the new schema can decode data produced with all previous schema versions in that subject. This setting is stricter thanBackward
.Forward
: Data produced using the new schema must be readable by clients using the previous registered schema. Producers must be upgraded first, but consumers using the new schema might not be able to read data produced with even older schemas. This setting allows deleting optional fields and adding fields.Forward_transitive
: Data produced using the new schema must be readable by using all previous schema versions. This setting is stricter thanForward
.Full
: The new schema is both backwards- and forwards-compatible with the previously registered schema version. Clients can be upgraded in any order relative to the producer using the new schema. Allows adding or deleting optional fields.Full_transitive
: The new schema is both backwards- and forwards- compatible with all previous schema versions in that subject. This setting is stricter thanFull
.
Example for compatibility type
Assume you have a schema registry with Backward
compatibility type. You also
create several subjects within this registry, and they inherit the registry's
Backward
compatibility.
For a specific subject named user-events
, you need
stricter compatibility rules. You update the schema compatibility level for the
user-events
subject to Full
.
In this situation, the following rules apply:
Any new schema version registered under the
user-events
subject have to be both backwards- and forwards-compatible with the previously registered schema version for that subject.Other subjects in the schema registry still adhere to the registry-level
Backward
compatibility setting unless their compatibility has been explicitly configured.
If you were to later change the schema registry's compatibility level to
Forward
, this change would affect the default compatibility for any new
subjects created within the registry. However, the user-events
subject would
retain its explicitly set Full
compatibility, as subject-level configurations
override registry-level configurations.
This demonstrates how you can have a default compatibility level for the entire registry while also having the flexibility to define specific compatibility requirements for individual subjects based on your application needs.
For more information, see Update compatibility type.
About schema references
Schema references allow you to define common
structures once and refer to them from multiple schemas. For example
an Address
schema might be used as part of both a Customer
and a Supplier
schema.
This approach promotes reusability and consistency across your schemas. Additionally, using schema references creates clear dependencies, explicitly tracking which schemas rely on others. This improves the maintainability of your schema architecture.
When one schema needs to use another common schema, it includes a reference to
that common schema. This relationship is formally defined by a SchemaReference
structure.
A SchemaReference
has the following components:
name
(string): the fully qualified name of the schema being referenced for Avro formats or the filename of an imported type for Protobuf formats, as used within the schema definition itself.subject
(string): the name of the subject under which the referenced schema is registered in the schema registry.version
(int32): the specific version number of the referenced schema.
A schema that uses other schemas declares these dependencies in a references
field. This field holds a list of SchemaReference
objects.
Example for schema references
Assume you need to define schemas for both Customer
data and Supplier
data,
and both need to include an address. With schema references, you can define the
address structure once and reuse it.
To follow this example, see Create a subject.
Create a subject named
address_schema
, and register the definition for a standard address. When you create a subject for the first time, you are also creating version 1 of the schema for that subject.Avro
Create and store this as subject
address_schema_avro
version 1.{ "type": "record", "name": "Address", "namespace": "com.example.common", "fields": [ {"name": "street", "type": "string"}, {"name": "city", "type": "string"}, {"name": "zipCode", "type": "string"}, {"name": "country", "type": "string", "default": "USA"} ] }
Protobuf
Create and store this as subject
address_schema_proto
version 1.syntax = "proto3"; package com.example.common; message Address { string street = 1; string city = 2; string zip_code = 3; string country = 4; }
Create the
customer_schema
schema. Instead of repeating the address fields, you reference theaddress_schema
schema.Avro
The
billingAddress
field's typecom.example.common.Address
refers to theAddress
schema defined in the previous step.{ "type": "record", "name": "Customer", "namespace": "com.example.crm", "fields": [ {"name": "customerId", "type": "long"}, {"name": "customerName", "type": "string"}, // This field's type refers to the Address schema {"name": "billingAddress", "type": "com.example.common.Address"} ] }
When registering
customer_schema_avro
, its metadata would include a schema reference:// Conceptual metadata for customer_schema_avro "references": [ { "name": "com.example.common.Address", "subject": "address_schema_avro", "version": 1 } ]
Protobuf
The
customer.proto
file importsaddress.proto
and usescom.example.common.Address
for thebilling_address
field.syntax = "proto3"; package com.example.crm; import "address.proto"; message Customer { int64 customer_id = 1; string customer_name = 2; // This field's type refers to the imported Address message com.example.common.Address billing_address = 3; }
When registering
customer_schema_proto
, its metadata would include a schema reference:// Conceptual metadata for customer_schema_proto "references": [ { "name": "address.proto", "subject": "address_schema_proto", "version": 1 } ]
Similarly, for your
Supplier
schema, you would add a schema reference pointing to the same commonAddress
schema.
About schema mode
Schema mode defines the operational state of a schema registry or a specific subject, and controls the types of modifications allowed. The schema mode can be applied to a registry or a specific subject within a schema registry. The following are the paths for the schema mode resources:
Registry-level mode: applies to the entire schema registry.
- Path:
projects/project/locations/location/schemaRegistry/schema_registry/mode
- Path:
Registry-level subject mode: applies to a specific subject within the entire schema registry.
- Path:
projects/project/locations/location/schemaRegistries/schema_registry/mode/subject
- Path:
The following modes are supported:
Readonly
: in this mode, the schema registry or the specified subject or subjects cannot be updated. Modifications, such as updating configurations or adding new schema versions, are prevented.Readwrite
: this mode allows limited write operations on the schema registry or the specified subject or subjects. It enables modifications like updating configurations and adding new schema versions. This is the default mode for both new schema registries and new subjects.
When determining whether a modification is allowed for a specific subject, the mode set at the subject level takes precedence over the mode set at the schema registry level.
For example, if a schema registry is in Readonly
mode, but a specific subject
within it is in Readwrite
mode, modifications to that specific subject is
allowed. However, the creation of new subjects is restricted by the
registry-level Readonly
mode.
Example for schema mode
Consider a schema registry with mode set to Readwrite
. This configuration
means you can add new subjects to the registry and new
schema versions to existing subjects.
Assume that you have a subject named production-config
that you
want to protect from accidental changes. You set the mode for the
production-config
subject to Readonly
. As a result, the following
conditions apply to the production-config
subject:
You cannot add new schema versions to the subject.
You cannot update the configuration (like compatibility type) for the subject.
Other subjects in the registry that don't have their own mode explicitly set remain in
Readwrite
mode, so you can still modify them.You can continue to create subjects in the registry because the registry itself is still in
Readwrite
mode.
Later, you might decide to put the entire schema registry into a maintenance
state by setting the registry-level mode to Readonly
. However, you have
another subject, staging-config
, which needs to remain modifiable for ongoing
testing. You explicitly set the mode for the staging-config
subject to
Readwrite
. As a result, the following conditions apply to the staging-config
subject:
The schema registry is now
Readonly
. You cannot create new subjects.Most existing subjects such as those without a specific mode override also become
Readonly
by inheritance. You cannot add new schema versions to them or update their configurations.The
production-config
subject remainsReadonly
as explicitly set.The
staging-config
subject remains inReadwrite
mode because its subject-level setting overrides the registry-levelReadonly
mode. You can continue to add schema versions and update configurations forstaging-config
.
This hierarchical approach provides flexibility in managing schema modifications at different levels of granularity.
For more information about how to update the schema mode, see Update schema mode.
Best practices
Don't use
None
as a compatibility type strategy because you run the risk of breaking clients with schema changes.Choose a forward-based strategy such as
Forward
orForward-transitive
if you want to update producers first. Choose a backward-based strategy such asBackward
orBackward-transitive
if you want to update consumers first.Choose a transitive strategy if you want to maintain compatibility with multiple previous schema versions. If you want to maximize compatibility and minimize the risk of breaking clients when updating schema versions, use the
Full-transitive
strategy.Disable automatic schema registration (
auto.register.schemas=false
) in production environments. Manage schema evolution deliberately through code reviews, testing, and controlled deployment processes.