Import BigQuery to Discovery Engine Error

I am trying to import a BigQuery Dataset into a Discovery Engine DataStore however I get the following error:
invalid JSON in google.cloud.discoveryengine.v1main.Document @ content: message google.cloud.discoveryengine.v1main.Document.Content, near 1:56 (offset 55): unexpected character: '"'; expected '{'

I have the following query to remove any semicolons:

REGEXP_REPLACE(summary, r';+', '')
 
I can't seem to fix this error and I would appreciate any help
Solved Solved
0 1 167
1 ACCEPTED SOLUTION

Hi @JackJonesTrellx,

Welcome to the Google Cloud Community!

It looks like your BigQuery provides the content field as plain text, but your Discovery Engine expects a JSON object, causing the "expected '{'" error.

Here are the potential ways that might help with your use case:

  • Constructing proper JSON for the content field: You may use TO_JSON_STRING(STRUCT(...)) in BigQuery to convert your data into a JSON string, ensuring proper formatting. STRUCT(...) helps organize your fields, like summary AS content_summary and description AS content_description, for better search relevance.
  • JSON Object Wrapping: You may want to use JSON_OBJECT('text', your_summary_column) in your BigQuery query to ensure the content field contains valid JSON. If semicolon cleanup is necessary, include REGEXP_REPLACE inside the JSON_OBJECT.
  • Incorrect Data Type Mapping: You may want to check the Discovery Engine documentation, as it provides details on how data types should be mapped when importing from BigQuery. For example, timestamps must follow a specific format.
  • Cast id to String: You may use `CAST(your_id_column AS STRING) AS id` to ensure the `id` field is stored as a string.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

 

View solution in original post

1 REPLY 1

Hi @JackJonesTrellx,

Welcome to the Google Cloud Community!

It looks like your BigQuery provides the content field as plain text, but your Discovery Engine expects a JSON object, causing the "expected '{'" error.

Here are the potential ways that might help with your use case:

  • Constructing proper JSON for the content field: You may use TO_JSON_STRING(STRUCT(...)) in BigQuery to convert your data into a JSON string, ensuring proper formatting. STRUCT(...) helps organize your fields, like summary AS content_summary and description AS content_description, for better search relevance.
  • JSON Object Wrapping: You may want to use JSON_OBJECT('text', your_summary_column) in your BigQuery query to ensure the content field contains valid JSON. If semicolon cleanup is necessary, include REGEXP_REPLACE inside the JSON_OBJECT.
  • Incorrect Data Type Mapping: You may want to check the Discovery Engine documentation, as it provides details on how data types should be mapped when importing from BigQuery. For example, timestamps must follow a specific format.
  • Cast id to String: You may use `CAST(your_id_column AS STRING) AS id` to ensure the `id` field is stored as a string.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.