Amazon Kendra - Developer Guide

Developer Guide

Amazon Kendra

Amazon Kendra Developer Guide

Amazon Kendra: Developer Guide

Amazon's trademarks and trade dress may not be used in connection with any product or service

that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any

manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are

the property of their respective owners, who may or may not be aﬃliated with, connected to, or

Suggested answer

Add machine learning generated answers to your users' queries. For example, 'How diﬃcult is this

course?'. Amazon Kendra can retrieve the most relevant text across all documents referring to a

course's diﬃculty and suggest the most relevant answer.

FAQ

Add a FAQ document to provide answers to frequently asked questions. For example, 'How many

hours to complete this course?'. Amazon Kendra can use the FAQ document containing the answer

to this question and give the correct answer.

Sort

Add sorting of the search results so that your users can organize the results by relevancy, created

time, last updated time, and other sorting criteria.

Documents

Conﬁgure how documents or search results are displayed on your search page. You can conﬁgure

how many results display on the page, include pagination such as page numbers, activate a user

feedback button, and arrange how document metadata ﬁelds are displayed in a search result.

Design and tune your search experience 137

Amazon Kendra Developer Guide

Language

Select a language to ﬁlter the search results or documents in the selected language.

Search box

Conﬁgure the size and placeholder text of your search box, as well as allow query suggestions.

Relevance tuning

Add boosting to document metadata ﬁelds to place more weight on these ﬁelds when your users

search for documents. You can add a weight that starts at 1 and incrementally increases to 10.

You can boost text, date, and numeric ﬁeld types. For example, to give _last_updated_at and

_created_at more weight or importance than other ﬁelds, give these ﬁelds a weight of 1 to 10,

depending on their importance. You can apply diﬀerent relevance tuning conﬁgurations for each

search application or experience.

Providing access to your search page

Access to your search experience is through IAM Identity Center. When you conﬁgure your search

experience, you grant other people listed in your Identity Center directory access to your Amazon

Kendra search page. They receive an email that directs them to sign in using their credentials

in IAM Identity Center to access the search page. You must set up IAM Identity Center at the

organization level or account holder level in AWS Organizations. For more information on setting

up IAM Identity Center, see Getting started with IAM Identity Center.

You activate user identities in IAM Identity Center with your search experience and assign Viewer or

Owner access permissions using the API or the console.

• Viewer: Allowed to issue queries, receive suggested answers relevant to their search, and

contribute their feedback to Amazon Kendra so that it keeps improving the search.

• Owner: Allowed to customize the design of the search page, tune the search, and use the search

application as a Viewer. Disabling access to viewers in the console is currently not supported.

To assign other people access to your search experience, you ﬁrst activate user identities in IAM

Identity Center with your Amazon Kendra experience by using the ExperienceConﬁguration

object. You specify the ﬁeld name that contains the identiﬁers of your users such as user name

or email address. You then grant your list of users access to your search experience using the

AssociateEntitiesToExperience API and deﬁne their permissions as Viewer or Owner using the

Providing access to your search page 138

Amazon Kendra Developer Guide

AssociatePersonasToEntities API. You specify each user or group using the EntityConﬁguration

object and whether that user or group is a Viewer or Owner using the EntityPersonaConﬁguraton

object.

To assign other people access to your search experience using the console, you ﬁrst need to create

an experience and conﬁrm your identity and that you are an owner. Then you can assign other

users or groups as viewers or owners. In the console, select your index and then select Experiences

in the navigation menu. After you create your experience, you can select your experience from the

list. Go to Access management to assign users or groups as viewers or owners.

Conﬁguring a search experience

The following is an example of conﬁguring or creating a search experience.

Console

To create an Amazon Kendra search experience

1. In the left navigation pane, under Indexes, select Experiences and then select Create

experience.

2. On the Conﬁgure experience page, enter a name and description for your experience,

choose your content sources, and choose the IAM role for your experience. For more

information on IAM roles, see IAM roles for Amazon Kendra experiences.

3. On the Conﬁrm your identity from an Identity Center directory page, select your user ID

such as your email. If you do not have an Identity Center directory, simply enter your full

name and email to create an Identity Center directory. This includes you as a user of the

experience and automatically assigns you owner access rights.

4. On the Review to open Experience Builder page, review your conﬁguration details and

select Create experience and open Experience Builder to start editing your search page.

CLI

To create an Amazon Kendra experience

aws kendra create-experience \

--name experience-name \

--description "experience description" \

--index-id index-id \

--role-arn arn:aws:iam::account-id:role/role-name \

Conﬁguring a search experience 139

Amazon Kendra Developer Guide

--configuration '{"ExperienceConfiguration":[{"ContentSourceConfiguration":

{"DataSourceIds":["data-source-1","data-source-2"]},

"UserIdentityConfiguration":"identity attribute name"}]}'

aws kendra describe-experience \

--endpoints experience-endpoint-URL(s)

Python

To create an Amazon Kendra experience

import boto3

from botocore.exceptions import ClientError

import pprint

import time

kendra = boto3.client("kendra")

print("Create an experience.")

# Provide a name for the experience

name = "experience-name"

# Provide an optional description for the experience

description = "experience description"

# Provide the index ID for the experience

index_id = "index-id"

# Provide the IAM role ARN required for Amazon Kendra experiences

role_arn = "arn:aws:iam::${account-id}:role/${role-name}"

# Configure the experience

configuration = {"ExperienceConfiguration":

[{

"ContentSourceConfiguration":{"DataSourceIds":["data-source-1","data-

source-2"]},

"UserIdentityConfiguration":"identity attribute name"

}]

}

try:

experience_response = kendra.create_experience(

Name = name,

Description = description,

IndexId = index_id,

RoleArn = role_arn,

Configuration = configuration

Conﬁguring a search experience 140

Amazon Kendra Developer Guide

)

pprint.pprint(experience_response)

experience_endpoints = experience_response["Endpoints"]

print("Wait for Amazon Kendra to create the experience.")

while True:

# Get the details of the experience, such as the status

experience_description = kendra.describe_experience(

Endpoints = experience_endpoints

)

status = experience_description["Status"]

print(" Creating experience. Status: "+status)

time.sleep(60)

if status != "CREATING":

break

except ClientError as e:

print("%s" % e)

print("Program ends.")

Java

To create an Amazon Kendra

package com.amazonaws.kendra;

import java.util.concurrent.TimeUnit;

import software.amazon.awssdk.services.kendra.KendraClient;

import software.amazon.awssdk.services.kendra.model.CreateExperienceRequest;

import software.amazon.awssdk.services.kendra.model.CreateExperienceResponse;

import software.amazon.awssdk.services.kendra.model.DescribeExperienceRequest;

import software.amazon.awssdk.services.kendra.model.DescribeExperienceResponse;

import software.amazon.awssdk.services.kendra.model.ExperienceStatus;

public class CreateExperienceExample {

public static void main(String[] args) throws InterruptedException {

System.out.println("Create an experience");

Conﬁguring a search experience 141

Amazon Kendra Developer Guide

String experienceName = "experience-name";

String experienceDescription = "experience description";

String indexId = "index-id";

String experienceRoleArn = "arn:aws:iam::account-id:role/role-name";

KendraClient kendra = KendraClient.builder().build();

CreateExperienceRequest createExperienceRequest = CreateExperienceRequest

.builder()

.name(experienceName)

.description(experienceDescription)

.roleArn(experienceRoleArn)

.configuration(

ExperienceConfiguration

.builder()

.contentSourceConfiguration(

ContentSourceConfiguration(

.builder()

.dataSourceIds("data-source-1","data-source-2")

.build()

)

.userIdentityConfiguration(

UserIdentityConfiguration(

.builder()

.identityAttributeName("identity-attribute-name")

.build()

)

).build()

).build();

CreateExperienceResponse createExperienceResponse =

kendra.createExperience(createExperienceRequest);

System.out.println(String.format("Experience response %s",

createExperienceResponse));

String experienceEndpoints = createExperienceResponse.endpoints();

System.out.println(String.format("Wait for Kendra to create the

experience.", experienceEndpoints));

while (true) {

DescribeExperienceRequest describeExperienceRequest =

DescribeExperienceRequest.builder().endpoints(experienceEndpoints).build();

Conﬁguring a search experience 142

Amazon Kendra Developer Guide

DescribeExperienceResponse describeEpxerienceResponse =

kendra.describeExperience(describeExperienceRequest);

ExperienceStatus status = describeExperienceResponse.status();

TimeUnit.SECONDS.sleep(60);

if (status != ExperienceStatus.CREATING) {

break;

}

System.out.println("Experience creation is complete.");

}

Conﬁguring a search experience 143

Amazon Kendra Developer Guide

Adjusting capacity

Amazon Kendra provides resources for your index in capacity units. Each capacity unit provides

additional resources for your index. There are separate capacity units for document storage and for

queries. You can only add capacity units to Amazon Kendra Enterprise Edition indexes. You can't

add capacity to a Developer Edition index.

A document storage capacity unit provides the following additional storage for your index.

• 100,000 documents or 30 GB of storage.

A query capacity unit provides the following additional queries for your index.

• 0.1 queries per second or approximately 8,000 queries per day.

Each index comes with a base capacity equal to 1 capacity unit (30 GB of storage and 0.1 queries

per second). There is an additional cost for each additional capacity unit. For details, see Amazon

Kendra pricing.

You can add up to 100 extra capacity units to your storage and query resources for an index. If you

need more units, simply contact Support.

You can adjust capacity units up to 5 times per day to ﬁt your usage requirements. You can't reduce

document storage capacity below the number of documents stored in your index. For example, if

you are storing 150,000 documents, you can't reduce the storage capacity below 1 additional unit.

You can view the resources an index is using in the console by selecting the name of the index to

open the index settings and other information, or you can use the DescribeIndex API.

Amazon Kendra also returns exceptions when you exceed the capacity of an index. You get a

ServiceQuotaExceededException when the total extracted size of all the documents exceeds

the limit for an index. You get a InvalidRequest for each document when the number of

documents exceeds the limit for an index. You get a ThrottlingException when the number

of queries per second exceeds the limit. For more information on limits, see Quotas for Amazon

Kendra.

Accumulated queries will last up to 24 hours.

144

Amazon Kendra Developer Guide

Viewing capacity

View the resources that your index is using with the Amazon Kendra console by selecting the name

of your index to access the details. The console also provides usage graphs so you can determine

how much storage and query capacity your index uses. You can use this information to help you

plan when to add additional capacity.

To view document storage and query use (console)

1. Sign into the AWS Management Console and open the Amazon Kendra console at https://

console.aws.amazon.com/kendra/home.

2. From the list of indexes, choose the index you want to access.

3. Scroll to the settings section to view the current total document storage and query capacity.

To view capacity using the Amazon Kendra API, use the CapacityUnits parameter in the

DescribeIndex API.

Adding and removing capacity

If you need additional capacity for your index, you can add it using the console or the Amazon

Kendra API.

To add or remove storage or query capacity (console)

1. Sign into the AWS Management Console and open the Amazon Kendra console at https://

console.aws.amazon.com/kendra/home.

2. From the list of indexes, choose the index that you want to access.

3. Select Edit, or select Edit from the Actions dropdown.

4. Select Next to go to the provisioning details page.

5. Add or remove document storage and/or query capacity units.

6. Continue to select Next to go to the review page and then select Update to save your changes.

After you update the capacity of your index, it can take several minutes for the changes to take

eﬀect.

Viewing capacity 145

Amazon Kendra Developer Guide

To add or remove capacity using the Amazon Kendra API, use the CapacityUnits parameter in

the UpdateIndex API.

Amazon Kendra Intelligent Ranking capacity

A capacity unit provides the following additional rescore requests per second for a rescore

execution plan. A rescore execution plan is a resource used to provision the Rescore API.

• 0.01 requests per second.

Each rescore execution plan comes with a base capacity equal to 1 capacity unit (0.01 requests

per second). There is an additional cost for each additional capacity unit. For details, see Amazon

Kendra pricing.

You can add up to 1000 extra capacity units for a rescore execution plan. If you need more units,

simply contact Support.

Query suggestions capacity

When using query suggestions, there’s a base query capacity of 2.5 GetQuerySuggestions calls

per second. The GetQuerySuggestions capacity is ﬁve times the provisioned query capacity for

an index, or the base capacity of 2.5 calls per second, whichever is higher. For example, the base

capacity for an index is 0.1 queries per second, and GetQuerySuggestions capacity has a base of

2.5 calls per second. If you add another 0.1 queries per second to total 0.2 queries per second for

an index, the GetQuerySuggestions capacity is 2.5 calls per second (higher than ﬁve times 0.2

queries per second).

Amazon Kendra experience capacity

Search experience capacity

Amazon Kendra starts to throttle Query, QuerySuggestions, SubmitFeedback for your

Amazon Kendra experience at 15 requests per second and 40 requests per second for query

bursting. For an index with more than 150 query capacity units, these limits still apply.

For example, your query capacity units for your index is 150, so your search experience application

can handle 15 requests per second. However, if you scaled to 200 query capacity units, then your

Amazon Kendra Intelligent Ranking capacity 146

Amazon Kendra Developer Guide

search experience app would still only handle 15 requests per second. If you limit your index to 100

query capacity units, then your search experience app would only handle 10 requests per second.

Adaptive query bursting

Amazon Kendra has a provisioned base capacity of 1 query capacity unit. You can use up to 8,000

queries per day with a minimum throughput of 0.1 queries per second (per query capacity unit).

Accumulated queries will last up to 24 hours and can accommodate bursts of traﬃc. The amount

of burst allowed varies because it depends on the cluster's load at any given time. Provision enough

query capacity units to handle your peak load levels.

An adaptive approach to handling unexpected bursts of traﬃc beyond the provisioned throughput

is Amazon Kendra's built-in adaptive query bursting. Adaptive query bursting is available in the

Enterprise Edition of Amazon Kendra.

Adaptive query bursting is a built-in capability that allows you to apply unused query capacity to

handle unexpected traﬃc. Amazon Kendra accumulates your unused queries at your provisioned

queries per second rate, every second, up to the maximum number of queries you've provisioned

for your Amazon Kendra index. These accumulated queries are used for unexpected traﬃc above

the allocated capacity. Optimal performance of adaptive query bursting can vary, depending on

several factors such as your total index size, query complexity, accumulated unused queries, and

overall load on your index. It is recommended that you perform your own load tests to accurately

measure bursting capacity.

Adaptive query bursting 147

Amazon Kendra Developer Guide

Getting started

This section shows you how to create a data source and add your documents to an Amazon Kendra

index. Instructions are provided for the AWS console, the AWS CLI, a Python program using the

AWS SDK for Python (Boto3), and a Java program using the AWS SDK for Java.

Topics

• Prerequisites

• Getting started with the Amazon Kendra console

• Getting started (AWS CLI)

• Getting started (AWS SDK for Python (Boto3))

• Getting started (AWS SDK for Java)

• Getting started with an Amazon S3 data source (console)

• Getting started with a MySQL database data source (console)

• Getting started with an AWS IAM Identity Center identity source (console)

Prerequisites

The following steps are prequisites for the getting started exercises. The steps show you how to

set up your account, create an IAM role that gives Amazon Kendra permission to make calls on your

behalf, and index documents from an Amazon S3 bucket. An S3 bucket is used as an example, but

you can use other data sources that Amazon Kendra supports. See Data sources.

Sign up for an AWS account

If you do not have an AWS account, complete the following steps to create one.

To sign up for an AWS account

1. Open https://portal.aws.amazon.com/billing/signup.

2. Follow the online instructions.

Part of the sign-up procedure involves receiving a phone call and entering a veriﬁcation code

on the phone keypad.

Prerequisites 148

Amazon Kendra Developer Guide

When you sign up for an AWS account, an AWS account root user is created. The root user

has access to all AWS services and resources in the account. As a security best practice, assign

administrative access to a user, and use only the root user to perform tasks that require root

user access.

AWS sends you a conﬁrmation email after the sign-up process is complete. At any time, you can

view your current account activity and manage your account by going to https://aws.amazon.com/

and choosing My Account.

Create a user with administrative access

After you sign up for an AWS account, secure your AWS account root user, enable AWS IAM Identity

Center, and create an administrative user so that you don't use the root user for everyday tasks.

Secure your AWS account root user

1. Sign in to the AWS Management Console as the account owner by choosing Root user and

entering your AWS account email address. On the next page, enter your password.

For help signing in by using root user, see Signing in as the root user in the AWS Sign-In User

Guide.

2. Turn on multi-factor authentication (MFA) for your root user.

For instructions, see Enable a virtual MFA device for your AWS account root user (console) in

the IAM User Guide.

Create a user with administrative access

1. Enable IAM Identity Center.

For instructions, see Enabling AWS IAM Identity Center in the AWS IAM Identity Center User

Guide.

2. In IAM Identity Center, grant administrative access to a user.

For a tutorial about using the IAM Identity Center directory as your identity source, see

Conﬁgure user access with the default IAM Identity Center directory in the AWS IAM Identity

Center User Guide.

Create a user with administrative access 149

Amazon Kendra Developer Guide

Sign in as the user with administrative access

• To sign in with your IAM Identity Center user, use the sign-in URL that was sent to your email

address when you created the IAM Identity Center user.

For help signing in using an IAM Identity Center user, see Signing in to the AWS access portal in

the AWS Sign-In User Guide.

Assign access to additional users

1. In IAM Identity Center, create a permission set that follows the best practice of applying least-

privilege permissions.

For instructions, see Create a permission set in the AWS IAM Identity Center User Guide.

2. Assign users to a group, and then assign single sign-on access to the group.

For instructions, see Add groups in the AWS IAM Identity Center User Guide.

• If you are using an S3 bucket containing documents to test Amazon Kendra, create an S3

bucket in the same region that you are using Amazon Kendra. For instructions, see Creating

and Conﬁguring an S3 Bucket in the Amazon Simple Storage Service User Guide.

Upload your documents to your S3 bucket. For instructions, see Uploading, Downloading, and

Managing Objects in the Amazon Simple Storage Service User Guide.

If you are using another data source, you must have an active site and credentials to connect to

the data source.

If you are using the console to get started, start with Getting started with the Amazon Kendra

console.

Amazon Kendra resources: AWS CLI, SDK, console

There are certain permissions required if you use CLI, SDK, or the console.

To use Amazon Kendra for the CLI, SDK, or console you must have permissions to allow Amazon

Kendra to create and manage resources on your behalf. Depending on your use case, these

permissions include access to the Amazon Kendra API itself, AWS KMS keys if you want to encrypt

your data through a custom CMK, Identity Center directory if you want to integrate with AWS IAM

Amazon Kendra resources: AWS CLI, SDK, console 150

Amazon Kendra Developer Guide

Identity Center or create a Search Experience. For a full list of permissions for diﬀerent use cases,

see IAM roles.

First, you must attach the below permissions to your IAM user.

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "Stmt1644430853544",

"Action": [

"kms:CreateGrant",

"kms:DescribeKey"

],

"Effect": "Allow",

"Resource": "*"

},

{

"Sid": "Stmt1644430878150",

"Action": "kendra:*",

"Effect": "Allow",

"Resource": "*"

},

{

"Sid": "Stmt1644430973706",

"Action": [

"sso:AssociateProfile",

"sso:CreateManagedApplicationInstance",

"sso:DeleteManagedApplicationInstance",

"sso:DisassociateProfile",

"sso:GetManagedApplicationInstance",

"sso:GetProfile",

"sso:ListDirectoryAssociations",

"sso:ListProfileAssociations",

"sso:ListProfiles"

],

"Effect": "Allow",

"Resource": "*"

},

{

"Sid": "Stmt1644430999558",

"Action": [

"sso-directory:DescribeGroup",

"sso-directory:DescribeGroups",

Amazon Kendra resources: AWS CLI, SDK, console 151

Amazon Kendra Developer Guide

"sso-directory:DescribeUser",

"sso-directory:DescribeUsers"

],

"Effect": "Allow",

"Resource": "*"

},

{

"Sid": "Stmt1644431025960",

"Action": [

"identitystore:DescribeGroup",

"identitystore:DescribeUser",

"identitystore:ListGroups",

"identitystore:ListUsers"

],

"Effect": "Allow",

"Resource": "*"

}

]

}

Second, if you use the CLI or SDK, you must also create an IAM role and policy to access Amazon

CloudWatch Logs. If you are using the console, you don't need to create an IAM role and policy for

this. You create this as part of the console procedure.

To create an IAM role and policy for the AWS CLI and SDK that allows Amazon Kendra to access

your Amazon CloudWatch Logs.

1. Sign in to the AWS Management Console and open the IAM console at https://

console.aws.amazon.com/iam/.

2. From the left menu, choose Policies and then choose Create policy.

3. Choose JSON and then replace the default policy with the following:

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"cloudwatch:PutMetricData"

],

"Resource": "*",

"Condition": {

Amazon Kendra resources: AWS CLI, SDK, console 152

Amazon Kendra Developer Guide

"StringEquals": {

"cloudwatch:namespace": "AWS/Kendra"

}

},

{

"Effect": "Allow",

"Action": [

"logs:DescribeLogGroups"

],

"Resource": "*"

},

{

"Effect": "Allow",

"Action": [

"logs:CreateLogGroup"

],

"Resource": [

"arn:aws:logs:region:account ID:log-group:/aws/kendra/*"

]

},

{

"Effect": "Allow",

"Action": [

"logs:DescribeLogStreams",

"logs:CreateLogStream",

"logs:PutLogEvents"

],

"Resource": [

"arn:aws:logs:region:account ID:log-group:/aws/kendra/*:log-

stream:*"

]

}

]

}

4. Choose Review policy.

5. Name the policy "KendraPolicyForGettingStartedIndex" and then choose Create policy.

6. From the left menu, choose Roles and then choose Create role.

7. Choose Another AWS account and then type your account ID in Account ID. Choose Next:

Permissions.

8. Choose the policy that you created above and then choose Next: Tags

Amazon Kendra resources: AWS CLI, SDK, console 153

Amazon Kendra Developer Guide

9. Don't add any tags. Choose Next: Review.

10. Name the role "KendraRoleForGettingStartedIndex" and then choose Create role.

11. Find the role that you just created. Choose the role name to open the summary. Choose Trust

relationships and then choose Edit trust relationship.

12. Replace the existing trust relationship with the following:

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Service": "kendra.amazonaws.com"

},

"Action": "sts:AssumeRole"

}

]

}

13. Choose Update trust policy.

Third, if you use an Amazon S3 to store your documents or you are using S3 to test Amazon

Kendra, you also must create an IAM role and policy to access your bucket. If you are using another

data source, see IAM roles for data sources.

To create an IAM role and policy that allows Amazon Kendra to access and index your Amazon

S3 bucket.

1. Sign in to the AWS Management Console and open the IAM console at https://

console.aws.amazon.com/iam/.

2. From the left menu, choose Policies and then choose Create policy.

3. Choose JSON and then replace the default policy with the following:

{

"Version": "2012-10-17",

"Statement": [

{

"Action": [

"s3:GetObject"

Amazon Kendra resources: AWS CLI, SDK, console 154

Amazon Kendra Developer Guide

],

"Resource": [

"arn:aws:s3:::bucket name/*"

],

"Effect": "Allow"

},

{

"Action": [

"s3:ListBucket"

],

"Resource": [

"arn:aws:s3:::bucket name"

],

"Effect": "Allow"

},

{

"Effect": "Allow",

"Action": [

"kendra:BatchPutDocument",

"kendra:BatchDeleteDocument"

],

"Resource": "arn:aws:kendra:region:account ID:index/*"

}

]

}

4. Choose Review policy.

5. Name the policy "KendraPolicyForGettingStartedDataSource" and then choose Create policy.

6. From the left menu, choose Roles and then choose Create role.

7. Choose Another AWS account and then type your account ID in Account ID. Choose Next:

Permissions.

8. Choose the policy that you created above and then choose Next: Tags

9. Don't add any tags. Choose Next: Review.

10. Name the role "KendraRoleForGettingStartedDataSource" and then choose Create role.

11. Find the role that you just created. Choose the role name to open the summary. Choose Trust

relationships and then choose Edit trust relationship.

12. Replace the existing trust relationship with the following:

{

"Version": "2012-10-17",

Amazon Kendra resources: AWS CLI, SDK, console 155

Amazon Kendra Developer Guide

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Service": "kendra.amazonaws.com"

},

"Action": "sts:AssumeRole"

}

]

}

13. Choose Update trust policy.

Depending on how you want to use the Amazon Kendra API, do one of the following.

• Getting started (AWS CLI)

• Getting started (AWS SDK for Java)

• Getting started (AWS SDK for Python (Boto3))

Getting started with the Amazon Kendra console

The following procedures show how to create and test an Amazon Kendra index by using the AWS

console. In the procedures you create an index and a data source for an index. Finally, you test your

index by making a search request.

Step 1: To create an index (console)

1. Sign in to the AWS Management Console and open the Amazon Kendra console at https://

console.aws.amazon.com/kendra/.

2. Select Create index in the Indexes section.

3. In the Specify index details page, give your index a name and a description.

4. In IAM role, choose Create a new role and then give the role a name. The IAM role will have

the preﬁx "AmazonKendra-".

5. Leave all of the other ﬁelds at their defaults. Choose Next.

6. In the Conﬁgure user access control page, choose Next.

7. In the Provisioning details page, choose Developer edition.

8. Choose Create to create your index.

Getting started with the Amazon Kendra console 156

Amazon Kendra Developer Guide

9. Wait for your index to be created. Amazon Kendra provisions the hardware for your index. This

operation can take some time.

Step 2: To add a data source to an index (console)

1. View the available data sources to connect Amazon Kendra to and index your documents.

2. In the navigation pane, select Data sources and then select Add data source for your chosen

data source.

3. Follow the steps to conﬁgure the data source.

Step 3: To search an index (console)

1. In the navigation pane, choose the option to search your index.

2. Enter a search term that's appropriate for your index. The top results and top document

results are shown.

Getting started (AWS CLI)

The following procedure shows how to create an Amazon Kendra index using the AWS CLI. The

procedure creates a data source, index, and runs a query on the index.

To create an Amazon Kendra index (CLI)

1. Do the Prerequisites.

2. Enter the following command to create an index.

aws kendra create-index \

--name cli-getting-started-index \

--description "Index for CLI getting started guide." \

--role-arn arn:aws:iam::account id:role/KendraRoleForGettingStartedIndex

3. Wait for Amazon Kendra to create the index. Check the progress using the following command.

When the status ﬁeld is ACTIVE, go on to the next step.

aws kendra describe-index \

--id index id

4. At the command prompt, enter the following command to create a data source.

Getting started (AWS CLI) 157

Amazon Kendra Developer Guide

aws kendra create-data-source \

--index-id index id \

--name data source name \

--role-arn arn:aws:iam::account id:role/KendraRoleForGettingStartedDataSource \

--type S3 \

--configuration '{"S3Configuration":{"BucketName":"S3 bucket name"}}'

If you connect to your data source using a template schema, conﬁgure the template schema.

aws kendra create-data-source \

--index-id index id \

--name data source name \

--role-arn arn:aws:iam::account id:role/KendraRoleForGettingStartedDataSource \

--type TEMPLATE \

--configuration '{"TemplateConfiguration":{"Template":{JSON schema}}}'

5. It will take Amazon Kendra a while to create the data source. Enter the following command to

check the progress. When the status is ACTIVE, go on to the next step.

aws kendra describe-data-source \

--id data source ID \

--index-id index ID

6. Enter the following command to synchronize the data source.

aws kendra start-data-source-sync-job \

--id data source ID \

--index-id index ID

7. Amazon Kendra will index your data source. The amount of time that it takes depends on the

number of documents. You can check the status of the sync job using the following command.

When the status is ACTIVE, go on to the next step.

aws kendra describe-data-source \

--id data source ID \

--index-id index ID

8. Enter the following command to make a query.

aws kendra query \

Getting started (AWS CLI) 158

Amazon Kendra Developer Guide

--index-id index ID \

--query-text "search term"

The results of the search are displayed in JSON format.

Getting started (AWS SDK for Python (Boto3))

The following program is an example of using Amazon Kendra in a Python program. The program

performs the following actions:

1. Creates a new index using the CreateIndex operation.

2. Waits for index creation to complete. It uses the DescribeIndex operation to monitor the status

of the index.

3. Once the index is active, it creates a data source using the CreateDataSource operation.

4. Waits for data source creation to complete. It uses the DescribeDataSource operation to monitor

the status of the data source.

5. When the data source is active, it synchronizes the index with the contents of the data source

using the StartDataSourceSyncJob operation.

import boto3

from botocore.exceptions import ClientError

import pprint

import time

kendra = boto3.client("kendra")

print("Create an index.")

# Provide a name for the index

index_name = "python-getting-started-index"

# Provide an optional decription for the index

description = "Getting started index"

# Provide the IAM role ARN required for indexes

index_role_arn = "arn:aws:iam::${accountId}:role/KendraRoleForGettingStartedIndex"

try:

index_response = kendra.create_index(

Description = description,

Getting started (SDK for Python (Boto3)) 159

Amazon Kendra Developer Guide

Name = index_name,

RoleArn = index_role_arn

)

pprint.pprint(index_response)

index_id = index_response["Id"]

print("Wait for Amazon Kendra to create the index.")

while True:

# Get the details of the index, such as the status

index_description = kendra.describe_index(

Id = index_id

)

# When status is not CREATING quit.

status = index_description["Status"]

print(" Creating index. Status: "+status)

time.sleep(60)

if status != "CREATING":

break

print("Create an S3 data source.")

# Provide a name for the data source

data_source_name = "python-getting-started-data-source"

# Provide an optional description for the data source

data_source_description = "Getting started data source."

# Provide the IAM role ARN required for data sources

data_source_role_arn = "arn:aws:iam::${accountId}:role/

KendraRoleForGettingStartedDataSource"

# Provide the data source connection information

S3_bucket_name = "S3-bucket-name"

data_source_type = "S3"

# Configure the data source

configuration = {"S3Configuration":

{

"BucketName": S3_bucket_name

}

"""

If you connect to your data source using a template schema,

configure the template schema

Getting started (SDK for Python (Boto3)) 160

Amazon Kendra Developer Guide

configuration = {"TemplateConfiguration":

{

"Template": {JSON schema}

}

"""

data_source_response = kendra.create_data_source(

Name = data_source_name,

Description = data_source_name,

RoleArn = data_source_role_arn,

Type = data_source_type,

Configuration = configuration,

IndexId = index_id

)

pprint.pprint(data_source_response)

data_source_id = data_source_response["Id"]

print("Wait for Amazon Kendra to create the data source.")

while True:

# Get the details of the data source, such as the status

data_source_description = kendra.describe_data_source(

Id = data_source_id,

IndexId = index_id

)

# If status is not CREATING, then quit

status = data_source_description["Status"]

print(" Creating data source. Status: "+status)

time.sleep(60)

if status != "CREATING":

break

print("Synchronize the data source.")

sync_response = kendra.start_data_source_sync_job(

Id = data_source_id,

IndexId = index_id

)

pprint.pprint(sync_response)

Getting started (SDK for Python (Boto3)) 161

Amazon Kendra Developer Guide

print("Wait for the data source to sync with the index.")

while True:

jobs = kendra.list_data_source_sync_jobs(

Id = data_source_id,

IndexId = index_id

)

# For this example, there should be one job

status = jobs["History"][0]["Status"]

print(" Syncing data source. Status: "+status)

if status != "SYNCING":

break

time.sleep(60)

except ClientError as e:

print("%s" % e)

print("Program ends.")

Getting started (AWS SDK for Java)

The following program is an example of using Amazon Kendra in a Java program. The program

performs the following actions:

1. Creates a new index using the CreateIndex operation.

2. Waits for index creation to complete. It uses the DescribeIndex operation to monitor the status

of the index.

3. Once the index is active, it creates a data source using the CreateDataSource operation.

4. Waits for data source creation to complete. It uses the DescribeDataSource operation to monitor

the status of the data source.

5. When the data source is active, it synchronizes the index with the contents of the data source

using the StartDataSourceSyncJob operation.

package com.amazonaws.kendra;

import java.util.concurrent.TimeUnit;

Getting started (SDK for Java) 162

Amazon Kendra Developer Guide

import software.amazon.awssdk.services.kendra.KendraClient;

import software.amazon.awssdk.services.kendra.model.CreateDataSourceRequest;

import software.amazon.awssdk.services.kendra.model.CreateDataSourceResponse;

import software.amazon.awssdk.services.kendra.model.CreateIndexRequest;

import software.amazon.awssdk.services.kendra.model.CreateIndexResponse;

import software.amazon.awssdk.services.kendra.model.DataSourceConfiguration;

import software.amazon.awssdk.services.kendra.model.DataSourceStatus;

import software.amazon.awssdk.services.kendra.model.DataSourceSyncJob;

import software.amazon.awssdk.services.kendra.model.DataSourceSyncJobStatus;

import software.amazon.awssdk.services.kendra.model.DataSourceType;

import software.amazon.awssdk.services.kendra.model.DescribeDataSourceRequest;

import software.amazon.awssdk.services.kendra.model.DescribeDataSourceResponse;

import software.amazon.awssdk.services.kendra.model.DescribeIndexRequest;

import software.amazon.awssdk.services.kendra.model.DescribeIndexResponse;

import software.amazon.awssdk.services.kendra.model.IndexStatus;

import software.amazon.awssdk.services.kendra.model.ListDataSourceSyncJobsRequest;

import software.amazon.awssdk.services.kendra.model.ListDataSourceSyncJobsResponse;

import software.amazon.awssdk.services.kendra.model.S3DataSourceConfiguration;

import software.amazon.awssdk.services.kendra.model.StartDataSourceSyncJobRequest;

import software.amazon.awssdk.services.kendra.model.StartDataSourceSyncJobResponse;

public class CreateIndexAndDataSourceExample {

public static void main(String[] args) throws InterruptedException {

System.out.println("Create an index");

String indexDescription = "Getting started index for Kendra";

String indexName = "java-getting-started-index";

String indexRoleArn = "arn:aws:iam::<your AWS account ID>:role/<name of an IAM

role>";

System.out.println(String.format("Creating an index named %s", indexName));

KendraClient kendra = KendraClient.builder().build();

CreateIndexRequest createIndexRequest = CreateIndexRequest

.builder()

.description(indexDescription)

.name(indexName)

.roleArn(indexRoleArn)

.build();

CreateIndexResponse createIndexResponse =

kendra.createIndex(createIndexRequest);

System.out.println(String.format("Index response %s", createIndexResponse));

Getting started (SDK for Java) 163

Amazon Kendra Developer Guide

String indexId = createIndexResponse.id();

System.out.println(String.format("Waiting until the index with index ID %s is

created", indexId));

while (true) {

DescribeIndexRequest describeIndexRequest =

DescribeIndexRequest.builder().id(indexId).build();

DescribeIndexResponse describeIndexResponse =

kendra.describeIndex(describeIndexRequest);

IndexStatus status = describeIndexResponse.status();

if (status != IndexStatus.CREATING) {

break;

}

TimeUnit.SECONDS.sleep(60);

}

System.out.println("Creating an S3 data source");

String dataSourceName = "java-getting-started-data-source";

String dataSourceDescription = "Getting started data source";

String s3BucketName = "an-aws-kendra-amzn-s3-demo-bucket";

String dataSourceRoleArn = "arn:aws:iam::<your AWS account ID>:role/<name of an

IAM role>";

CreateDataSourceRequest createDataSourceRequest = CreateDataSourceRequest

.builder()

.indexId(indexId)

.name(dataSourceName)

.description(dataSourceDescription)

.roleArn(dataSourceRoleArn)

.type(DataSourceType.S3)

.configuration(

DataSourceConfiguration

.builder()

.s3Configuration(

S3DataSourceConfiguration

.builder()

.bucketName(s3BucketName)

.build()

).build()

).build();

Getting started (SDK for Java) 164

Amazon Kendra Developer Guide

CreateDataSourceResponse createDataSourceResponse =

kendra.createDataSource(createDataSourceRequest);

System.out.println(String.format("Response of creating data source: %s",

createDataSourceResponse));

String dataSourceId = createDataSourceResponse.id();

System.out.println(String.format("Waiting for Kendra to create the data source

%s", dataSourceId));

DescribeDataSourceRequest describeDataSourceRequest = DescribeDataSourceRequest

.builder()

.indexId(indexId)

.id(dataSourceId)

.build();

while (true) {

DescribeDataSourceResponse describeDataSourceResponse =

kendra.describeDataSource(describeDataSourceRequest);

DataSourceStatus status = describeDataSourceResponse.status();

System.out.println(String.format("Creating data source. Status: %s",

status));

if (status != DataSourceStatus.CREATING) {

break;

}

TimeUnit.SECONDS.sleep(60);

}

System.out.println(String.format("Synchronize the data source %s",

dataSourceId));

StartDataSourceSyncJobRequest startDataSourceSyncJobRequest =

StartDataSourceSyncJobRequest

.builder()

.indexId(indexId)

.id(dataSourceId)

.build();

StartDataSourceSyncJobResponse startDataSourceSyncJobResponse =

kendra.startDataSourceSyncJob(startDataSourceSyncJobRequest);

System.out.println(String.format("Waiting for the data

source to sync with the index %s for execution ID %s", indexId,

startDataSourceSyncJobResponse.executionId()));

// For this particular list, there should be just one job

Getting started (SDK for Java) 165

Amazon Kendra Developer Guide

ListDataSourceSyncJobsRequest listDataSourceSyncJobsRequest =

ListDataSourceSyncJobsRequest

.builder()

.indexId(indexId)

.id(dataSourceId)

.build();

while (true) {

ListDataSourceSyncJobsResponse listDataSourceSyncJobsResponse =

kendra.listDataSourceSyncJobs(listDataSourceSyncJobsRequest);

DataSourceSyncJob job = listDataSourceSyncJobsResponse.history().get(0);

System.out.println(String.format("Syncing data source. Status: %s",

job.status()));

if (job.status() != DataSourceSyncJobStatus.SYNCING) {

break;

}

TimeUnit.SECONDS.sleep(60);

}

System.out.println("Index setup is complete");

}

Getting started with an Amazon S3 data source (console)

You can use the Amazon Kendra console to get started using an Amazon S3 bucket as a data store.

When you use the console you specify all of the connection information you need to index the

contents of the bucket. For more information, see Amazon S3.

Use the following procedure to create a basic S3 bucket data source using the default

conﬁguration. The procedure assumes that you created an index following the steps in step 1 of

Getting started with the Amazon Kendra console.

To create an S3 bucket data source using the Amazon Kendra console

1. Sign into the AWS Management Console and open the Amazon Kendra console at https://

console.aws.amazon.com/kendra/home.

2. From the list of indexes, choose the index that you want to add the data source to.

Getting started with S3 (console) 166

Amazon Kendra Developer Guide

3. Choose Add data sources.

4. From the list of data source connectors, choose Amazon S3.

5. On the Deﬁne attributes page, give your data source a name and optionally a description.

Leave the Tags ﬁeld blank. Choose Next to continue.

6. In the Enter the data source location ﬁeld, enter the name of the S3 bucket that contains

your documents. You can enter the name directly, or you can browse for the name by choosing

Browse. The bucket must be in the same Region as the index.

7. In IAM role choose Create a new role and then type a role name. For more information, see

IAM roles for Amazon S3 data sources.

8. In the Set sync run schedule section, choose Run on demand.

9. Choose Next to continue.

10. On the Review and create page review the details of your S3 data source. If you want to make

changes, choose the Edit button next to the item that you want to change. When you are

satisﬁed with your choices, choose Create to create your S3 data source.

After you choose Create, Amazon Kendra starts creating the data source. It can take several

minutes for the data source to be created. When it is ﬁnished, the status of the data source

changes from Creating to Active.

After creating the data source, you need to sync the Amazon Kendra index with the data source.

Choose Sync now to start the sync process. It can take several minutes to several hours to

synchronize the data source, depending on the number and size of the documents.

Getting started with a MySQL database data source (console)

You can use the Amazon Kendra console to get started using a MySQL database as a data source.

When you use the console you specify the connection information you need to index the contents

of a MySQL database. For more information, see Using a database data source.

You ﬁrst need to create a MySQL database, then you can create a data source for the database.

Use the following procedure to create a basic MySQL database. The procedure assumes that you

have already created an index following step 1 of Getting started with the Amazon Kendra console.

Getting started with MySQL (console) 167

Amazon Kendra Developer Guide

To create a MySQL database

1. Sign in to the AWS Management Console and open the Amazon RDS console at https://

console.aws.amazon.com/rds/.

2. From the navigation pane, choose Subnet groups and then choose Create DB Subnet Group.

3. Name the group and choose your Virtual Private Cloud (VPC). For more information on

conﬁguring a VPC, see Conﬁguring Amazon Kendra to use a VPC.

4. Add your VPC's private subnets. Your private subnets are the ones that are not connected to

your NAT. Choose Create.

5. From the navigation pane, choose Databases and then choose Create database.

6. Use the following parameters to create the database. Leave all of the other parameters at their

defaults.

• Engine options—MySQL

• Templates—Free tier

• Credential Settings—Enter and conﬁrm a password

• Under Connectivity, choose Additional connectivity conﬁguration. Make the following

choices.

• Subnet group—Choose the subnet group that you created in step 4.

• VPC security group—Choose the group that contains both inbound and outbound rules

that you created in your VPC. For example, DataSourceSecurityGroup. For more

information on conﬁguring a VPC, see Conﬁguring Amazon Kendra to use a VPC.

•

Under Additional conﬁguration, set the Initial database name to content.

7. Choose Create database.

8. From the list of databases, choose your new database. Make a note of the database endpoint.

9. After you create your database, you must create a table to hold your documents. Creating

a table is outside the scope of these instructions. When you create your table, note the

following:

•

Database name—content

•

Table name—documents

•

Columns—ID, Title, Body, and LastUpdate. You can include additional columns if you

want.

Getting started with MySQL (console) 168

Amazon Kendra Developer Guide

Now that you have created your MySQL database, you can create a data source for the database.

To create a MySQL data source

1. Sign in to the AWS Management Console and open the Amazon Kendra console at https://

console.aws.amazon.com/kendra/home.

2. From the navigation pane, choose Indexes and then choose your index.

3. Choose Add data sources and then choose Amazon RDS.

4. Type a name and description for the data source and then choose Next.

5. Choose MySQL.

6. Under Connection access, enter the following information:

• Endpoint—The endpoint of the database that you created earlier.

• Port—The port number for the database. For MySQL, the default is 3306.

• Type of authentication—Choose New.

• New secret container name—A name for the Secrets Manager container for the database

credentials.

• Username—The name of a user with administrative access to the database.

• Password—The password for the user, and then choose Save authentication.

•

Database name—content.

•

Table name—documents.

• IAM role—Choose Create a new role, and then type a name for the role.

7. In Column conﬁguration enter the following:

•

Document ID column name—ID

•

Document title column name—Title

•

Document data column name—Body

8. In Column change detection enter the following:

•

Change detecting columns—LastUpdate

9. In Conﬁgure VPC & security group provide the following:

• In Virtual Private Cloud (VPC), choose your VPC.

• In Subnets, choose the private subnets that you created in your VPC.

Getting started with MySQL (console) 169

Amazon Kendra Developer Guide

• In VPC security groups, choose the security group that contains both inbound and

outbound rules that you created in your VPC for MySQL databases. For example,

DataSourceSecurityGroup.

10. In Set sync run schedule, choose Run on demand and then choose Next.

11. In Data source ﬁeld mapping, choose Next.

12. Review the conﬁguration of your data source to make sure that it is correct. When you're

satisﬁed that everything is correct, choose Create.

Getting started with an AWS IAM Identity Center identity

source (console)

An AWS IAM Identity Center identity source contains information on your users and groups. This is

useful for setting up user context ﬁltering, where Amazon Kendra ﬁlters search results for diﬀerent

users based on the user or their group's access to documents.

To create an IAM Identity Center identity source, you must activate IAM Identity Center and create

an organization in AWS Organizations. When you activate IAM Identity Center and create an

organization for the ﬁrst time, it automatically defaults to the Identity Center directory as the

identity source. You can change to Active Directory (Amazon managed or self-managed) or an

external identity provider as your identity source. You must follow the correct guidance for this —

see Changing your IAM Identity Center identity source. You can have only one identity source per

organization.

In order for your users and groups to be assigned diﬀerent levels of access to documents, you need

to include your users and groups in your access control list when you ingest documents into your

index. This allows your users and groups to search for documents in Amazon Kendra in accordance

with their level of access. When you issue a query, the user ID needs to be an exact match of the

user name in IAM Identity Center.

You must also grant the required permissions to use IAM Identity Center with Amazon Kendra. For

more information, see IAM roles for IAM Identity Center.

To set up an IAM Identity Center identity source

1. Open the IAM Identity Center console.

2. Choose Enable IAM Identity Center, and then choose Create AWS organization.

Getting started with an IAM Identity Center identity source (console) 170

Amazon Kendra Developer Guide

Identity Center directory is created by default, and an email is sent to you to verify the email

address associated with the organization.

3. To add a group to your AWS organization, in the navigation pane, choose Groups.

4. On the Groups page, choose Create group and enter a group name and description in the

dialog box. Choose Create.

5. To add a user to your Organizations, in the navigation pane, choose Users.

6. On the Users page, choose Add user. Under User details, specify all required ﬁelds. For

Password, choose Send an email to the user. Choose Next.

7. To add a user to a group, choose Groups and select a group.

8. On the Details page, under Group members, choose Add user.

9. On the Add users to group page, select the user you want to add as a member of the group.

You can select multiple users to add to a group.

10. To sync your list of users and groups with IAM Identity Center, change your identity source to

Active Directory or External identity provider.

Identity Center directory is the default identity source and requires you to manually add your

users and groups using this source if you do not have your own list managed by a provider. To

change your identity source, you must follow the correct guidance for this—see Changing your

IAM Identity Center identity source.

Note

If using Active Directory or an external identity provider as your identity source, you must

map the email addresses of your users to IAM Identity Center user names when you specify

the System for Cross-domain Identity Management (SCIM) protocol. For more information,

see the IAM Identity Center guide on SCIM for enabling IAM Identity Center.

Once you have set up your IAM Identity Center identity source, you can activate this in the console

when you create or edit your index. Go to User access control in your index settings and edit your

settings to allow fetching user-group information from IAM Identity Center.

You can also activate IAM Identity Center using the UserGroupResolutionConﬁguration object.

You provide the UserGroupResolutionMode as AWS_SSO and create an IAM role that gives

Getting started with an IAM Identity Center identity source (console) 171

Amazon Kendra Developer Guide

permission to call sso:ListDirectoryAssociations, sso-directory:SearchUsers, sso-

directory:ListGroupsForUser, sso-directory:DescribeGroups.

Warning

Amazon Kendra currently does not support using

UserGroupResolutionConfiguration with an AWS organization member account for

your IAM Identity Center identity source. You must create your index in the management

account for the organization in order to use UserGroupResolutionConfiguration.

The following is an overview of how to set up a data source with

UserGroupResolutionConfiguration and user access control to ﬁlter search results on user

context. This assumes you have already created an index and an IAM role for indexes. You create an

index and provide the IAM role using the CreateIndex API.

Setting up a data source with UserGroupResolutionConfiguration and user context

ﬁltering

1. Create an IAM role that gives permission to access your IAM Identity Center identity source.

2.

Conﬁgure UserGroupResolutionConfiguration by setting the mode to AWS_SSO and call

UpdateIndex to update your index to use IAM Identity Center.

3. If you want to use token-based user access control to ﬁlter search results on user context,

set UserContextPolicy to USER_TOKEN when you call UpdateIndex. Otherwise, Amazon

Kendra crawls the access control list for each of your documents for most data source

connectors. You can also ﬁlter search results on user context in the Query API by providing

user and group information in UserContext. You can also map users to their groups using

PutPrincipalMapping so that you only need to provide the user ID when you issue the query.

4. Create an IAM role that gives permission to access your data source.

5. Conﬁgure your data source. You must provide the required connection information to connect

to your data source.

6. Create a data source using the CreateDataSource API. Provide the

DataSourceConfiguration object, which includes TemplateConfiguration, the ID of

your index, the IAM role for your data source, the data source type, and give your data source a

name. You can also update your data source.

Getting started with an IAM Identity Center identity source (console) 172

Amazon Kendra Developer Guide

Changing your IAM Identity Center identity source

Warning

Changing your identity source in IAM Identity Center Settings might aﬀect the preservation

of user and group information. To do this safely, it is recommended you review

Considerations for changing your identity source. When you change your identity source, a

new identity source ID is generated. Check you are using the correct ID before you set the

mode to AWS_SSO in UserGroupResolutionConﬁguration.

To change your IAM Identity Center identity source

1. Open the IAM Identity Center> console.

2. Choose Settings.

3. On the Settings page, under Identity source, choose Change.

4. On the Change identity source page, select your preferred identity source, and then choose

Next.

Changing your IAM Identity Center identity source 173

Amazon Kendra Developer Guide

Creating an index

You can create an index using the console, or by calling the CreateIndex API. You can use the AWS

Command Line Interface (AWS CLI) or SDK with the API. After you created your index, you can add

documents directly to it or from a data source.

To create an index, you must provide the Amazon Resource Name (ARN) of an AWS Identity and

Access Management (IAM) role for indexes to access CloudWatch. For more information, see IAM

roles for indexes.

The following tabs provide a procedure for creating an index by using the AWS Management

Console, and code examples for using the AWS CLI, and Python and Java SDKs.

Console

To create an index

1. Sign in to the AWS Management Console and open the Amazon Kendra console at https://

console.aws.amazon.com/kendra/.

2. Select Create index in the Indexes section.

3. In Specify index details, give your index a name and a description.

4. In IAM role provide an IAM role. To ﬁnd a role, choose from roles in your account that

contain the word "kendra" or enter the name of another role. For more information about

the permissions that the role requires, see IAM roles for indexes.

5. Choose Next.

6. On the Conﬁgure user access control page, choose Next. You can update your index to use

tokens for access control after you create an index. For more information, see Controlling

access to documents.

7. On the Provisioning details page, choose Create.

8. It might take some time for the index to create. Check the list of indexes to watch the

progress of creating your index. When the status of the index is ACTIVE, your index is ready

to use.

174

Amazon Kendra Developer Guide

AWS CLI

To create an index

1.

Use the following command to create an index. The role-arn must be the Amazon

Resource Name (ARN) of an IAM role that can run Amazon Kendra actions. For more

information, see IAM roles.

The command is formatted for Linux and macOS. If you are using Windows, replace the

Unix line continuation character (\) with a caret (^).

aws kendra create-index \

--name index name \

--description "index description" \

--role-arn arn:aws:iam::account ID:role/role name

2. It might take some time for the index to create. To check the state of your index, use the

index ID returned by create-index with the following command. When the status of the

index is ACTIVE, your index is ready to use.

aws kendra describe-index \

--index-id index ID

Python

To create an index

• Provide values for the following variables in the code example that follows:

•

description—A description of the index that you're creating. This is optional.

•

index_name—The name of the index that you're creating.

•

role_arn—The Amazon Resource Name (ARN) of a role that can run Amazon Kendra

APIs. For more information, see IAM roles.

import boto3

from botocore.exceptions import ClientError

import pprint

import time

175

Amazon Kendra Developer Guide

kendra = boto3.client("kendra")

print("Create an index.")

# Provide a name for the index

index_name = "index-name"

# Provide an optional description for the index

description = "index description"

# Provide the IAM role ARN required for indexes

role_arn = "arn:aws:iam::${account id}:role/${role name}"

try:

index_response = kendra.create_index(

Name = index_name,

Description = description,

RoleArn = role_arn

)

pprint.pprint(index_response)

index_id = index_response["Id"]

print("Wait for Amazon Kendra to create the index.")

while True:

# Get the details of the index, such as the status

index_description = kendra.describe_index(

Id = index_id

)

# If status is not CREATING, then quit

status = index_description["Status"]

print(" Creating index. Status: "+status)

if status != "CREATING":

break

time.sleep(60)

except ClientError as e:

print("%s" % e)

print("Program ends.")

176

Amazon Kendra Developer Guide

Java

To create an index

• Provide values for the following variables in the code example that follows:

•

description—A description of the index that you're creating. This is optional.

•

index_name—The name of the index that you're creating.

•

role_arn—The Amazon Resource Name (ARN) of a role that can run Amazon Kendra

APIs. For more information, see IAM roles.

package com.amazonaws.kendra;

import java.util.concurrent.TimeUnit;

import software.amazon.awssdk.services.kendra.KendraClient;

import software.amazon.awssdk.services.kendra.model.CreateIndexRequest;

import software.amazon.awssdk.services.kendra.model.CreateIndexResponse;

import software.amazon.awssdk.services.kendra.model.DescribeIndexRequest;

import software.amazon.awssdk.services.kendra.model.DescribeIndexResponse;

import software.amazon.awssdk.services.kendra.model.IndexStatus;

public class CreateIndexExample {

public static void main(String[] args) throws InterruptedException {

String indexDescription = "Getting started index for Kendra";

String indexName = "java-getting-started-index";

String indexRoleArn = "arn:aws:iam::<your AWS account ID>:role/

KendraRoleForGettingStartedIndex";

System.out.println(String.format("Creating an index named %s",

indexName));

CreateIndexRequest createIndexRequest = CreateIndexRequest

.builder()

.description(indexDescription)

.name(indexName)

.roleArn(indexRoleArn)

.build();

KendraClient kendra = KendraClient.builder().build();

177

Amazon Kendra Developer Guide

CreateIndexResponse createIndexResponse =

kendra.createIndex(createIndexRequest);

System.out.println(String.format("Index response %s",

createIndexResponse));

String indexId = createIndexResponse.id();

System.out.println(String.format("Waiting until the index with ID %s is

created.", indexId));

while (true) {

DescribeIndexRequest describeIndexRequest =

DescribeIndexRequest.builder().id(indexId).build();

DescribeIndexResponse describeIndexResponse =

kendra.describeIndex(describeIndexRequest);

IndexStatus status = describeIndexResponse.status();

if (status != IndexStatus.CREATING) {

break;

}

TimeUnit.SECONDS.sleep(60);

}

System.out.println("Index creation is complete.");

}

After you created your index, you add documents to it. You can add them directly or create a data

source that updates your index on a regular schedule.

Topics

• Adding documents directly to an index with batch upload

• Adding frequently asked questions (FAQs) to an index

• Creating custom document ﬁelds

• Controlling user access to documents with tokens

Adding documents directly to an index with batch upload

You can add documents directly to an index using the BatchPutDocument API. You can't add

documents directly using the console. If you use the console, you connect to a data source to add

Adding documents directly to an index with batch upload 178

Amazon Kendra Developer Guide

documents to your index. Documents can be added from an S3 bucket or supplied as binary data.

For a list of document types supported by Amazon Kendra see Types of documents.

Adding documents to an index using BatchPutDocument is an asynchronous operation. After

you call the BatchPutDocument API, you use the BatchGetDocumentStatus API to monitor the

progress of indexing your documents. When you call the BatchGetDocumentStatus API with

a list of document IDs, it returns the status of the document. When the status of the document

is INDEXED or FAILED, processing of the document is complete. When the status is FAILED, the

BatchGetDocumentStatus API returns the reason that the document couldn't be indexed.

If you want to alter your content and document metadata ﬁelds or attributes during the document

ingestion process, see Amazon Kendra Custom Document Enrichment. If you want to use a custom

data source, each document you submit using the BatchPutDocument API requires a data source

ID and execution ID as attributes or ﬁelds. For more information, see Required attributes for

custom data sources.

Note

Each document ID must be unique per index. You cannot create a data source to index

your documents with their unique IDs and then use the BatchPutDocument API to

index the same documents, or vice versa. You can delete a data source and then use

the BatchPutDocument API to index the same documents, or vice versa. Using the

BatchPutDocument and BatchDeleteDocument APIs in combination with an Amazon

Kendra data source connector for the same set of documents could cause inconsistencies

with your data. Instead, we recommend using the Amazon Kendra custom data source

connector.

The following developer guide documents show how to add documents directly to an index.

Topics

• Adding documents with the BatchPutDocument API

• Adding documents from an S3 bucket

Adding documents directly to an index with batch upload 179

Amazon Kendra Developer Guide

Adding documents with the BatchPutDocument API

The following example adds a blob of text to an index by calling BatchPutDocument. You can use

the BatchPutDocument API to add documents directly to your index. For a list of document types

supported by Amazon Kendra see Types of documents.

For an example of creating an index using the AWS CLI and SDKs, see Creating an index. To set up

the CLI and SDKs, see Setting up Amazon Kendra.

Note

Files added to the index must be in a UTF-8 encoded byte stream.

In the following examples, UTF-8 encoded text is added to the index.

CLI

In the AWS Command Line Interface, use the following command. The command is formatted

for Linux and macOS. If you are using Windows, replace the Unix line continuation character (\)

with a caret (^).

aws kendra batch-put-document \

--index-id index-id \

--documents '{"Id":"doc-id-1", "Blob":"Amazon.com is an online retailer.",

"ContentType":"PLAIN_TEXT", "Title":"Information about Amazon.com"}'

Python

import boto3

kendra = boto3.client("kendra")

# Provide the index ID

index_id = "index-id"

# Provide the title and text

title = "Information about Amazon.com"

text = "Amazon.com is an online retailer."

document = {

Adding documents with the BatchPutDocument API 180

Amazon Kendra Developer Guide

"Id": "1",

"Blob": text,

"ContentType": "PLAIN_TEXT",

"Title": title

}

documents = [

document

]

result = kendra.batch_put_document(

IndexId = index_id,

Documents = documents

)

print(result)

Java

package com.amazonaws.kendra;

import software.amazon.awssdk.core.SdkBytes;

import software.amazon.awssdk.services.kendra.KendraClient;

import software.amazon.awssdk.services.kendra.model.BatchPutDocumentRequest;

import software.amazon.awssdk.services.kendra.model.BatchPutDocumentResponse;

import software.amazon.awssdk.services.kendra.model.ContentType;

import software.amazon.awssdk.services.kendra.model.Document;

public class AddDocumentsViaAPIExample {

public static void main(String[] args) {

KendraClient kendra = KendraClient.builder().build();

String indexId = "yourIndexId";

Document testDoc = Document

.builder()

.title("The title of your document")

.id("a_doc_id")

.blob(SdkBytes.fromUtf8String("your text content"))

.contentType(ContentType.PLAIN_TEXT)

.build();

Adding documents with the BatchPutDocument API 181

Amazon Kendra Developer Guide

BatchPutDocumentRequest batchPutDocumentRequest = BatchPutDocumentRequest

.builder()

.indexId(indexId)

.documents(testDoc)

.build();

BatchPutDocumentResponse result =

kendra.batchPutDocument(batchPutDocumentRequest);

System.out.println(String.format("BatchPutDocument Result: %s", result));

}

Adding documents from an S3 bucket

You can add documents directly to your index from an Amazon S3 bucket using the

BatchPutDocument API. You can add up to 10 documents in the same call. When you use an S3

bucket, you must provide an IAM role with permission to access the bucket that contains your

documents. You specify the role in the RoleArn parameter.

Using the BatchPutDocument API to add documents from an Amazon S3 bucket is a one-time

operation. To keep an index synchronized with the contents of a bucket, create an Amazon S3 data

source. For more information, see Amazon S3 data source.

For an example of creating an index using the AWS CLI and SDKs, see Creating an index. To set up

the CLI and SDKs, see Setting up Amazon Kendra. For information on creating an S3 bucket, see

Amazon Simple Storage Service documentation.

In the following example, two Microsoft Word documents are added to the index using the

BatchPutDocument API.

Python

import boto3

kendra = boto3.client("kendra")

# Provide the index ID

index_id = "index-id"

# Provide the IAM role ARN required to index documents in an S3 bucket

role_arn = "arn:aws:iam::${acccountID}:policy/${roleName}"

Adding documents from an S3 bucket 182

Amazon Kendra Developer Guide

doc1_s3_file_data = {

"Bucket": "bucket-name",

"Key": "document1.docx"

}

doc1_document = {

"S3Path": doc1_s3_file_data,

"Title": "Document 1 title",

"Id": "doc_1"

}

doc2_s3_file_data = {

"Bucket": "bucket-name",

"Key": "document2.docx"

}

doc2_document = {

"S3Path": doc2_s3_file_data,

"Title": "Document 2 title",

"Id": "doc_2"

}

documents = [

doc1_document,

doc2_document

]

result = kendra.batch_put_document(

Documents = documents,

IndexId = index_id,

RoleArn = role_arn

)

print(result)

Java

package com.amazonaws.kendra;

import software.amazon.awssdk.services.kendra.KendraClient;

import software.amazon.awssdk.services.kendra.model.BatchPutDocumentRequest;

import software.amazon.awssdk.services.kendra.model.BatchPutDocumentResponse;

Adding documents from an S3 bucket 183

Amazon Kendra Developer Guide

import software.amazon.awssdk.services.kendra.model.Document;

import software.amazon.awssdk.services.kendra.model.S3Path;

public class AddFilesFromS3Example {

public static void main(String[] args) {

KendraClient kendra = KendraClient.builder().build();

String indexId = "yourIndexId";

String roleArn = "yourIndexRoleArn";

Document pollyDoc = Document

.builder()

.s3Path(

S3Path.builder()

.bucket("an-aws-kendra-amzn-s3-demo-bucket")

.key("What is Amazon Polly.docx")

.build())

.title("What is Amazon Polly")

.id("polly_doc_1")

.build();

Document rekognitionDoc = Document

.builder()

.s3Path(

S3Path.builder()

.bucket("an-aws-kendra-amzn-s3-demo-bucket")

.key("What is Amazon Rekognition.docx")

.build())

.title("What is Amazon rekognition")

.id("rekognition_doc_1")

.build();

BatchPutDocumentRequest batchPutDocumentRequest = BatchPutDocumentRequest

.builder()

.indexId(indexId)

.roleArn(roleArn)

.documents(pollyDoc, rekognitionDoc)

.build();

BatchPutDocumentResponse result =

kendra.batchPutDocument(batchPutDocumentRequest);

System.out.println(String.format("BatchPutDocument result: %s", result));

}

Adding documents from an S3 bucket 184

Amazon Kendra Developer Guide

}

Adding frequently asked questions (FAQs) to an index

You can add frequently asked questions (FAQs) directly to your index using the console or the

CreateFaq API. Adding FAQs to an index is an asynchronous operation. You put the data for the FAQ

in a ﬁle that you store in an Amazon Simple Storage Service bucket. You can use CSV or JSON ﬁles

as input for your FAQ:

• Basic CSV—A CSV ﬁle where each row contains a question, answer, and an optional source URI.

• Custom CSV—A CSV ﬁle that contains questions, answers, and headers for custom ﬁelds/

attributes that you can use to facet, display, or sort FAQ responses. You can also deﬁne access

control ﬁelds to limit the FAQ response to certain users and groups that are allowed to see the

FAQ response.

• JSON—A JSON ﬁle that contains questions, answers, and custom ﬁelds/attributes that you can

use to facet, display, or sort FAQ responses. You can also deﬁne access control ﬁelds to limit the

FAQ response to certain users and groups that are allowed to see the FAQ response.

For example, the following is a basic CSV ﬁle that provides answers to questions about free clinics

in Spokane, Washington USA and Mountain View, Missouri, USA.

How many free clinics are in Spokane WA?, 13

How many free clinics are there in Mountain View Missouri?, 7

Note

The FAQ ﬁle must be a UTF-8-encoded ﬁle.

Topics

• Creating index ﬁelds for an FAQ ﬁle

• Basic CSV ﬁle

• Custom CSV ﬁle

• JSON ﬁle

Adding frequently asked questions (FAQs) to an index 185

Amazon Kendra Developer Guide

• Using your FAQ ﬁle

• FAQ ﬁles in languages other than English

Creating index ﬁelds for an FAQ ﬁle

When you use a custom CSV or JSON ﬁle for input, you can declare custom ﬁelds for your FAQ

questions. For example, you can create a custom ﬁeld that assigns each FAQ question a business

department. When the FAQ is returned in a response, you can use the department as a facet to

narrow the search to "HR" or "Finance" only, for example.

A custom ﬁeld must map to an index ﬁeld. In the console, you use the Facet deﬁnition page

to create an index ﬁeld. When using the API, you must ﬁrst create an index ﬁeld using the

UpdateIndex API.

The ﬁeld/attribute type in the FAQ ﬁle must match the type of the associated index ﬁeld. For

example, the "Department" ﬁeld is a STRING_LIST type ﬁeld. So, you must provide values for the

department ﬁeld as a string list in your FAQ ﬁle. You can check the type of index ﬁelds using the

Facet deﬁnition page in the console or by using the DescribeIndex API.

When you create an index ﬁeld that maps to a custom attribute, you can mark it displayable,

facetable, or sortable. You can't make a custom attribute searchable.

In addition to the custom attributes, you can also use the Amazon Kendra reserved or common

ﬁelds in a custom CSV or JSON ﬁle. For more information, see Document attributes or ﬁelds.

Basic CSV ﬁle

Use a basic CSV ﬁle when you want to use a simple structure for your FAQs. In a basic CSV ﬁle,

each row has two or three ﬁelds: a question, an answer, and an optional source URI that points to a

document with more information.

The contents of the ﬁle must follow the RFC 4180 Common Format and MIME Type for Comma-

Separated Values (CSV) Files.

The following is a FAQ ﬁle in the basic CSV format.

How many free clinics are in Spokane WA?, 13, https://s3.region.company.com/bucket-

name/directory/faq.csv

Creating index ﬁelds for an FAQ ﬁle 186

Amazon Kendra Developer Guide

How many free clinics are there in Mountain View Missouri?, 7, https://

s3.region.company.com/bucket-name/directory/faq.csv

Custom CSV ﬁle

Use a custom CSV ﬁle when you want to add custom ﬁelds/attributes to your FAQ questions. For a

custom CSV ﬁle, you use a header row in your CSV ﬁle to deﬁne the additional attributes.

The CSV ﬁle must contain the following two required ﬁelds:

•

_question—The frequently asked question

•

_answer—The answer to the frequently asked question

Your ﬁle can contain both Amazon Kendra reserved ﬁelds and custom ﬁelds. The following is an

example of a custom CSV ﬁle.

_question,_answer,_last_updated_at,custom_string

How many free clinics are in Spokane WA?, 13, 2012-03-25T12:30:10+01:00, Note: Some

free clinics require you to meet certain criteria in order to use their services

How many free clinics are there in Mountain View Missouri?, 7,

2012-03-25T12:30:10+01:00, Note: Some free clinics require you to meet certain

criteria in order to use their services

The contents of the custom ﬁle must follow the RFC 4180 Common Format and MIME Type for

Comma-Separated Values (CSV) Files.

The following lists the types of custom ﬁelds:

• Date—ISO 8601-encoded date and time values.

For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25, 2012,

at 12:30PM (plus 10 seconds) in the Central European Time time zone.

•

Long—Numbers, such as 1234.

• String—String values. If your string contains commas, enclose the entire value in double

quotation marks (") (for example, "custom attribute, and more").

• String list—A list of string values. List the values in a comma-separated list that's enclosed in

quotation marks (") (for example, "item1, item2, item3"). If the list contains only a single

entry, you can omit the quotation marks (for example, item1).

Custom CSV ﬁle 187

Amazon Kendra Developer Guide

A custom CSV ﬁle can contain user access control ﬁelds. You can use these ﬁelds to limit access to

the FAQ to certain users and groups. To ﬁlter on user context, the user must provide user and group

information in the query. Otherwise, all relevant FAQs are returned. For more information, see User

context ﬁltering.

There following lists the user context ﬁlters for FAQs:

•

_acl_user_allow—Users in the allow list can see the FAQ in the query response. The FAQ isn't

returned to other users.

•

_acl_user_deny—Users in the deny list can't see the FAQ in the query response. The FAQ is

returned to all other users when it's relevant to the query.

•

_acl_group_allow—Users that are members of an allowed group can see the FAQ in the query

response. The FAQ isn't returned to users that are members of another group.

•

_acl_group_deny—Users that are members of a denied group can't see the FAQ in the query

response. The FAQ is returned to other groups when it's relevant to the query.

Provide the values for the allow and deny lists in comma-separated lists enclosed in quotation

marks (for example, "user1,user2,user3"). You can include a user or a group in either an allow

list or a deny list, but not both where the same user is individually allowed but also group denied. If

you include a user or group in both, you receive an error.

The following is an example of a custom CSV ﬁle with user context information.

_question, _answer, _acl_user_allow, _acl_user_deny, _acl_group_allow, _acl_group_deny

How many free clinics are in Spokane WA?, 13, "userID6201,userID7552",

"userID1001,userID2020", groupBasicPlusRate, groupPremiumRate

JSON ﬁle

You can use a JSON ﬁle to provide questions, answers, and ﬁelds for your index. You can add any of

the Amazon Kendra reserved ﬁelds or custom ﬁelds to the FAQ.

The following is the schema for the JSON ﬁle.

{

"SchemaVersion": 1,

"FaqDocuments": [

{

"Question": string,

JSON ﬁle 188

Amazon Kendra Developer Guide

"Answer": string,

"Attributes": {

string: object

additional attributes

},

"AccessControlList": [

{

"Name": string,

"Type": enum( "GROUP" | "USER" ),

"Access": enum( "ALLOW" | "DENY" )

},

additional user context

]

},

additional FAQ documents

]

}

The following example JSON ﬁle shows two FAQ documents. One of the documents has the

required question and answer only. The other document also includes additional ﬁeld and user

context or access control information.

{

"SchemaVersion": 1,

"FaqDocuments": [

{

"Question": "How many free clinics are in Spokane WA?",

"Answer": "13"

},

{

"Question": "How many free clinics are there in Mountain View Missouri?",

"Answer": "7",

"Attributes": {

"_source_uri": "https://s3.region.company.com/bucket-name/directory/

faq.csv",

"_category": "Charitable Clinics"

},

"AccessControlList": [

{

"Name": "[email protected]",

"Type": "USER",

"Access": "ALLOW"

},

JSON ﬁle 189

Amazon Kendra Developer Guide

{

"Name": "Admin",

"Type": "GROUP",

"Access": "ALLOW"

}

]

}

]

}

The following lists the types of custom ﬁelds:

• Date—A JSON string value with ISO 8601-encoded date and time values. For example,

2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25, 2012, at 12:30PM

(plus 10 seconds) in the Central European Time time zone.

•

Long—A JSON number value, such as 1234.

•

String—A JSON string value (for example, "custom attribute").

•

String list—A JSON array of string values (for example, ["item1,item2,item3"]).

A JSON ﬁle can contain user access control ﬁelds. You can use these ﬁelds to limit access to the

FAQ to certain users and groups. To ﬁlter on user context, the user must provide user and group

information in the query. Otherwise, all relevant FAQs are returned. For more information, see User

context ﬁltering.

You can include a user or a group in either an allow list or a deny list, but not both where the

same user is individually allowed but also group denied. If you include a user or group in both, you

receive an error.

The following is an example of including user access control to a JSON FAQ.

"AccessControlList": [

{

"Name": "group or user name",

"Type": "GROUP | USER",

"Access": "ALLOW | DENY"

},

additional user context

]

JSON ﬁle 190

Amazon Kendra Developer Guide

Using your FAQ ﬁle

After you store your FAQ input ﬁle in an S3 bucket, you use the console or the CreateFaq API to

put the questions and answers into your index. If you want to update a FAQ, delete the FAQ and

create it again. You use the DeleteFaq API to delete a FAQ.

You must provide an IAM role that has access to the S3 bucket that contains your source ﬁles. You

specify the role in the console or in the RoleArn parameter. The following is an example of adding

a FAQ ﬁle to an index.

Python

import boto3

kendra = boto3.client("kendra")

# Provide the index ID

index_id = "index-id"

# Provide the IAM role ARN required to index documents in an S3 bucket

role_arn = "arn:aws:iam::${accountId}:role/${roleName}"

# Provide the S3 bucket path information to the FAQ file

faq_path = {

"Bucket": "bucket-name",

"Key": "FreeClinicsUSA.csv"

}

response = kendra.create_faq(

S3Path = faq_path,

Name = "FreeClinicsUSA",

IndexId = index_id,

RoleArn = role_arn

)

print(response)

Java

package com.amazonaws.kendra;

import software.amazon.awssdk.services.kendra.KendraClient;

import software.amazon.awssdk.services.kendra.model.CreateFaqRequest;

Using your FAQ ﬁle 191

Amazon Kendra Developer Guide

import software.amazon.awssdk.services.kendra.model.CreateFaqResponse;

import software.amazon.awssdk.services.kendra.model.S3Path;

public class AddFaqExample {

public static void main(String[] args) {

KendraClient kendra = KendraClient.builder().build();

String indexId = "yourIndexId";

String roleArn = "your role for accessing S3 files";

CreateFaqRequest createFaqRequest = CreateFaqRequest

.builder()

.indexId(indexId)

.name("FreeClinicsUSA")

.roleArn(roleArn)

.s3Path(

S3Path

.builder()

.bucket("an-aws-kendra-amzn-s3-demo-bucket")

.key("FreeClinicsUSA.csv")

.build())

.build();

CreateFaqResponse response = kendra.createFaq(createFaqRequest);

System.out.println(String.format("The result of creating FAQ: %s",

response));

}

FAQ ﬁles in languages other than English

You can index a FAQ in a supported language. Amazon Kendra indexes FAQs in English by default if

you don't specify a language. You specify the language code when you call the CreateFaq operation

or you can include the language code for a FAQ in the FAQ metadata as a ﬁeld. If a FAQ doesn't

have a language code in its metadata speciﬁed in a metadata ﬁeld, the FAQ is indexed using the

language code speciﬁed when you call the CreateFAQ operation. To index a FAQ document in a

supported language in the console, go to FAQs and select Add FAQ. You choose a language from

the dropdown Language.

FAQ ﬁles in languages other than English 192

Amazon Kendra Developer Guide

Creating custom document ﬁelds

You can create custom attributes or ﬁelds for your documents in your Amazon Kendra index. For

example, you can create a custom ﬁeld or attribute called "Department" with the values of "HR",

"Sales", and "Manufacturing". If you map these custom ﬁelds or attributes to your Amazon Kendra

index, you can use them to ﬁlter the search results to include documents by the "HR" department

attribute, for example.

Before you can use a custom ﬁeld or attribute, you must ﬁrst create the ﬁeld in the index. Use the

console to edit the data source ﬁeld mappings to add a custom ﬁeld or use the UpdateIndex API to

create the index ﬁeld. You cannot change the ﬁeld data type once you have created the ﬁeld.

For most data sources, you map ﬁelds in the external data source to the corresponding ﬁelds in

Amazon Kendra. For more information, see Mapping data source ﬁelds. For S3 data sources, you

can create custom ﬁelds or attributes using a JSON metadata ﬁle.

You can create up to 500 custom ﬁelds or attributes.

You can also use Amazon Kendra reserved or common ﬁelds. For more information, see Document

attributes or ﬁelds.

Topics

• Updating custom document ﬁelds

Updating custom document ﬁelds

With the UpdateIndex API, you add custom ﬁelds or attributes using the

DocumentMetadataConfigurationUpdates parameter.

The following JSON example uses DocumentMetadataConfigurationUpdates to add a ﬁeld

called "Department" to the index.

"DocumentmetadataConfigurationUpdates": [

{

"Name": "Department",

"Type": "STRING_VALUE"

}

]

Creating custom document ﬁelds 193

Amazon Kendra Developer Guide

The following sections include examples for adding custom attributes or ﬁelds using the

BatchPutDocument and for an Amazon S3 data source.

Topics

• Adding custom attributes or ﬁelds with the BatchPutDocument API

• Adding custom attributes or ﬁelds to an Amazon S3 data source

Adding custom attributes or ﬁelds with the BatchPutDocument API

When you use the BatchPutDocument API to add a document to your index, you specify custom

ﬁelds or attributes as part of Attributes. You can add multiple ﬁelds or attributes when you call

the API. You can create up to 500 custom ﬁelds or attributes. The following example is a custom

ﬁeld or attribute that adds "Department" to a document.

"Attributes":

{

"Department": "HR",

"_category": "Vacation policy"

}

Adding custom attributes or ﬁelds to an Amazon S3 data source

When you use an S3 bucket as a data source for your index, you add metadata to the documents

with companion metadata ﬁles. You place the metadata JSON ﬁles in a directory structure that is

parallel to your documents. For more information, see S3 document metadata.

You specify custom ﬁelds or attributes in the Attributes JSON structure. You can create up to

500 custom ﬁelds or attributes. For example, the following example uses Attributes to deﬁne

three custom ﬁelds or attributes and one reserved ﬁeld.

"Attributes": {

"brand": "Amazon Basics",

"price": 1595,

"_category": "sports",

"subcategories": ["outdoors", "electronics"]

}

The following steps walk you through adding custom attributes to an Amazon S3 data source.

Updating custom document ﬁelds 194

Amazon Kendra Developer Guide

Topics

• Step 1: Create a Amazon Kendra index

• Step 2: Update index to add custom document ﬁelds

• Step 3: Create an Amazon S3 data source and map data source ﬁelds to custom attributes

Step 1: Create a Amazon Kendra index

Follow the steps in Creating an index to create your Amazon Kendra index.

Step 2: Update index to add custom document ﬁelds

After creating an index, you add ﬁelds to it. The following procedure shows how to add ﬁelds to an

index using the console and the CLI.

Console

To create index ﬁelds

1. Make sure you've created an index.

2. Then, from the left navigation menu, from Data management, choose Facet deﬁnition.

3. In Index ﬁeld settings guide, from Index ﬁelds, choose Add ﬁeld to add custom ﬁelds.

4. In the Add index ﬁeld dialog box, do the following:

• Field name – Add a ﬁeld name.

• Data type – Select data type, whether String, String list, or Date.

• Usage types – Select usage types, whether Facetable, Searchable, Displayable, and

Sortable.

Then, select Add.

Repeat the last step for any other ﬁelds you want to map.

CLI

aws kendra update-index \

--region $region \

--endpoint-url $endpoint \

--application-id $applicationId \

Updating custom document ﬁelds 195

Amazon Kendra Developer Guide

--index-id $indexId \

--document-metadata-configuration-updates \

"[

{

"Name": "string",

"Type": "STRING_VALUE"|"STRING_LIST_VALUE"|"LONG_VALUE"|"DATE_VALUE",

"Relevance": {

"Freshness": true|false,

"Importance": integer,

"Duration": "string",

"RankOrder": "ASCENDING"|"DESCENDING",

"ValueImportanceMap": {"string": integer

...}

},

"Search": {

"Facetable": true|false,

"Searchable": true|false,

"Displayable": true|false,

"Sortable": true|false

}

...

]"

Step 3: Create an Amazon S3 data source and map data source ﬁelds to custom attributes

To create an Amazon S3 data source and map ﬁelds to it, follow the instructions in Amazon S3.

If you're using the API, use the fieldMappings attribute under configuration when you use

the CreateDataSource API.

For an overview of how data source ﬁelds are mapped, see Mapping data source ﬁelds.

Controlling user access to documents with tokens

You can control which users or groups can access certain documents in your index or see certain

documents in their search results. This is called user context ﬁltering. It is a kind of personalized

search with the beneﬁt of controlling access to documents. For example, not all teams that search

the company portal for information should access top-secret company documents, nor are these

documents relevant to all users. Only speciﬁc users or groups of teams given access to top-secret

documents should see these documents in their search results.

Controlling user access to documents with tokens 196

Amazon Kendra Developer Guide

Amazon Kendra supports token-based user access control using the following token types:

• Open ID

• JWT with a shared secret

• JWT with a public key

• JSON

Amazon Kendra delivers highly secure enterprise search for your search applications. Your

search results reﬂect the security model of your organization. Customers are responsible for

authenticating and authorizing users to gain access to their search application. At search time,

the Amazon Kendra service ﬁlters search results based on user ID provided by the customer's

search application, and document access control lists (ACLs) collected by the Amazon Kendra

connectors during crawl/indexing time. The search results return URLs pointing back to the original

document repositories plus short excerpts. Access to the full document is still enforced by the

original repository.

Topics

• Using OpenID

• Using a JSON Web Token (JWT) with a shared secret

• Using a JSON Web Token (JWT) with a public key

• Using JSON

Using OpenID

To conﬁgure an Amazon Kendra index to use an OpenID token for access control, you need the

JWKS (JSON Web Key Set) URL from the OpenID provider. In most cases the JWKS URL is in the

following format (if they're following openId discovery) https://domain-name/.well_known/

jwks.json.

The following examples show how to use an OpenID token for user access control when you create

an index.

Console

1. Choose Create index to start creating a new index.

2. On the Specify index details page, give your index a name and a description.

Using OpenID 197

Amazon Kendra Developer Guide

3. For IAM role, select a role or select Create a new role to and specify a role name to create a

new role. The IAM role will have the preﬁx "AmazonKendra-".

4. Leave all of the other ﬁelds at their defaults. Choose Next.

5. In the Conﬁgure user access control page, under Access control settings, choose Yes to

use tokens for access control.

6. Under Token conﬁguration, select OpenID as the Token type.

7. Specify a Signing key URL. The URL should point to a set of JSON web keys.

8. Optional Under Advanced conﬁguration:

a. Specify a Username to use in the ACL check.

b. Specify one or more Groups to use in the ACL check.

c. Specify the Issuer that will validate the token issuer.

d. Specify the Client Id(s). You must specify a regular expression that match the audience

in the JWT.

9. In the Provisioning details page, choose Developer edition.

10. Choose Create to create your index.

11. Wait for your index to be created. Amazon Kendra provisions the hardware for your index.

This operation can take some time.

CLI

To create an index with the AWS CLI using a JSON input ﬁle, ﬁrst create a JSON ﬁle with your

desired parameters:

{

"Name": "user-context",

"Edition": "ENTERPRISE_EDITION",

"RoleArn": "arn:aws:iam::account-id:role:/my-role",

"UserTokenConfigurations": [

{

"JwtTokenTypeConfiguration": {

"KeyLocation": "URL",

"Issuer": "optional: specify the issuer url",

"ClaimRegex": "optional: regex to validate claims in the token",

"UserNameAttributeField": "optional: user",

"GroupAttributeField": "optional: group",

"URL": "https://example.com/.well-known/jwks.json"

Using OpenID 198

Amazon Kendra Developer Guide

}

],

"UserContextPolicy": "USER_TOKEN"

}

You can override the default user and group ﬁeld names. The default value for

UserNameAttributeField is "user". The default value for GroupAttributeField is

"groups".

Next, call create-index using the input ﬁle. For example, if the name of your JSON ﬁle is

create-index-openid.json, you can use the following:

aws kendra create-index --cli-input-json file://create-index-openid.json

Python

response = kendra.create_index(

Name='user-context',

Edition='ENTERPRISE_EDITION',

RoleArn='arn:aws:iam::account-id:role:/my-role',

UserTokenConfigurations=[

{

"JwtTokenTypeConfiguration": {

"KeyLocation": "URL",

"Issuer": "optional: specify the issuer url",

"ClaimRegex": "optional: regex to validate claims in the token",

"UserNameAttributeField": "optional: user",

"GroupAttributeField": "optional: group",

"URL": "https://example.com/.well-known/jwks.json"

}

],

UserContextPolicy='USER_TOKEN'

)

Using a JSON Web Token (JWT) with a shared secret

The following examples show how to use JSON Web Token (JWT) with a shared secret token for

user access control when you create an index.

Using a JSON Web Token (JWT) with a shared secret 199

Amazon Kendra Developer Guide

Console

1. Choose Create index to start creating a new index.

2. On the Specify index details page, give your index a name and a description.

3. For IAM role, select a role or select Create a new role to and specify a role name to create a

new role. The IAM role will have the preﬁx "AmazonKendra-".

4. Leave all of the other ﬁelds at their defaults. Choose Next.

5. In the Conﬁgure user access control page, under Access control settings, choose Yes to

use tokens for access control.

6. Under Token conﬁguration, select JWT with shared secret as the Token type.

7. Under Parameters for signing shared secret, choose the Type of secret. You can use an

existing AWS Secrets Manager shared secret or create a new shared secret.

To create a new shared secret, choose New and then follow these steps:

a. Under New AWS Secrets Manager secret, specify a Secret name. The preﬁx

AmazonKendra- will be added when you save the public key.

b. Specify a Key ID. The key id is a hint that indicates which key was used to secure the

JSON web signature of the token.

c. Choose the signing Algorithm for the token. This is the cryptographic algorithm used

to secure the ID token. For more information on RSA, see RSA Cryptography.

d. Specify a Shared secret by entering a base64 URL encoded secret. You can also select

Generate secret to have a secret generated for you. You must ensure the secret is a

base64 URL encoded secret.

e. (Optional) Specify when the shared secret is valid. You can specify the date and time a

secret is valid from, valid to, or both. The secret will be valid in the interval speciﬁed.

f. Select Save secret to save the new secret.

8. (Optional) Under Advanced conﬁguration:

a. Specify a Username to use in the ACL check.

b. Specify one or more Groups to use in the ACL check.

c. Specify the Issuer that will validate the token issuer.

d. Specify the Claim ID(s). You must specify a regular expression that matches the

audience in the JWT.

9. In the Provisioning details page, choose Developer edition.

Using a JSON Web Token (JWT) with a shared secret 200

Amazon Kendra Developer Guide

10. Choose Create to create your index.

11. Wait for your index to be created. Amazon Kendra provisions the hardware for your index.

This operation can take some time.

CLI

You can use JWT token with a shared secret inside of AWS Secrets Manager. The secret must

be a base64 URL encoded secret. You need the Secrets Manager ARN, and your Amazon

Kendra role must have access to GetSecretValue on the Secrets Manager resource. If you are

encrypting the Secrets Manager resource with AWS KMS, the role must also have access to the

decrypt action.

To create an index with the AWS CLI using a JSON input ﬁle, ﬁrst create a JSON ﬁle with your

desired parameters:

{

"Name": "user-context",

"Edition": "ENTERPRISE_EDITION",

"RoleArn": "arn:aws:iam::account-id:role:/my-role",

"UserTokenConfigurations": [

{

"JwtTokenTypeConfiguration": {

"KeyLocation": "SECRET_MANAGER",

"Issuer": "optional: specify the issuer url",

"ClaimRegex": "optional: regex to validate claims in the token",

"UserNameAttributeField": "optional: user",

"GroupAttributeField": "optional: group",

"SecretManagerArn": "arn:aws:secretsmanager:us-west-2:account

id:secret:/my-user-context-secret

}

],

"UserContextPolicy": "USER_TOKEN"

}

You can override the default user and group ﬁeld names. The default value for

UserNameAttributeField is "user". The default value for GroupAttributeField is

"groups".

Next, call create-index using the input ﬁle. For example, if the name of your JSON ﬁle is

create-index-openid.json, you can use the following:

Using a JSON Web Token (JWT) with a shared secret 201

Amazon Kendra Developer Guide

aws kendra create-index --cli-input-json file://create-index-openid.json

The secret must have the following format in AWS Secrets Manager:

{

"keys": [

{

"kid": "key_id",

"alg": "HS256|HS384|HS512",

"kty": "OCT",

"use": "sig", //this value can be sig only for now

"k": "secret",

"nbf":"ISO1806 date format"

"exp":"ISO1806 date format"

}

]

}

For more information about JWT, see jwt.io.

Python

You can use JWT token with a shared secret inside of AWS Secrets Manager. The secret must

be a base64 URL encoded secret. You need the Secrets Manager ARN, and your Amazon

Kendra role must have access to GetSecretValue on the Secrets Manager resource. If you are

encrypting the Secrets Manager resource with AWS KMS, the role must also have access to the

decrypt action.

response = kendra.create_index(

Name='user-context',

Edition='ENTERPRISE_EDITION',

RoleArn='arn:aws:iam::account-id:role:/my-role',

UserTokenConfigurations=[

{

"JwtTokenTypeConfiguration": {

"KeyLocation": "URL",

"Issuer": "optional: specify the issuer url",

"ClaimRegex": "optional: regex to validate claims in the token",

"UserNameAttributeField": "optional: user",

"GroupAttributeField": "optional: group",

"SecretManagerArn": "arn:aws:secretsmanager:us-west-2:account

id:secret:/my-user-context-secret"

Using a JSON Web Token (JWT) with a shared secret 202

Amazon Kendra Developer Guide

}

],

UserContextPolicy='USER_TOKEN'

)

Using a JSON Web Token (JWT) with a public key

The following examples show how to use JSON Web Token (JWT) with a public key for user access

control when you create an index. For more information about JWT, see jwt.io.

Console

1. Choose Create index to start creating a new index.

2. On the Specify index details page, give your index a name and a description.

3. For IAM role, select a role or select Create a new role to and specify a role name to create a

new role. The IAM role will have the preﬁx "AmazonKendra-".

4. Leave all of the other ﬁelds at their defaults. Choose Next.

5. In the Conﬁgure user access control page, under Access control settings, choose Yes to

use tokens for access control.

6. Under Token conﬁguration, select JWT with public key as the Token type.

7. Under Parameters for signing public key, choose the Type of secret. You can use an

existing AWS Secrets Manager secret or create a new secret.

To create a new secret, choose New and then follow these steps:

a. Under New AWS Secrets Manager secret, specify a Secret name. The preﬁx

AmazonKendra- will be added when you save the public key.

b. Specify a Key ID. The key id is a hint that indicates which key was used to secure the

JSON web signature of the token.

c. Choose the signing Algorithm for the token. This is the cryptographic algorithm used

to secure the ID token. For more information on RSA, see RSA Cryptography.

d. Under Certiﬁcate attributes, specify an optional Certiﬁcate chain. The certiﬁcate

chain is made up of a list of certiﬁcates. It begins with a server’s certiﬁcate and

terminates with the root certiﬁcate.

Using a JSON Web Token (JWT) with a public key 203

Amazon Kendra Developer Guide

e. Optional Specify the Thumbprint or ﬁngerprint. It should be is a hash of a certiﬁcate,

computed over all certiﬁcate data and its signature.

f. Specify the Exponent. This is the exponent value for the RSA public key. It is

represented as a Base64urlUInt-encoded value.

g. Specify the Modulus. This is the exponent value for the RSA public key. It is

represented as a Base64urlUInt-encoded value.

h. Select Save key to save the new key.

8. Optional Under Advanced conﬁguration:

a. Specify a Username to use in the ACL check.

b. Specify one or more Groups to use in the ACL check.

c. Specify the Issuer that will validate the token issuer.

d. Specify the Client Id(s). You must specify a regular expression that match the audience

in the JWT.

9. In the Provisioning details page, choose Developer edition.

10. Choose Create to create your index.

11. Wait for your index to be created. Amazon Kendra provisions the hardware for your index.

This operation can take some time.

CLI

You can use JWT with a public key inside of a AWS Secrets Manager. You need the Secrets

Manager ARN, and your Amazon Kendra role must have access to GetSecretValue on the

Secrets Manager resource. If you are encrypting the Secrets Manager resource with AWS KMS,

the role must also have access to the decrypt action.

To create an index with the AWS CLI using a JSON input ﬁle, ﬁrst create a JSON ﬁle with your

desired parameters:

{

"Name": "user-context",

"Edition": "ENTERPRISE_EDITION",

"RoleArn": "arn:aws:iam::account id:role:/my-role",

"UserTokenConfigurationList": [

{

"JwtTokenTypeConfiguration": {

Using a JSON Web Token (JWT) with a public key 204

Amazon Kendra Developer Guide

"KeyLocation": "SECRET_MANAGER",

"Issuer": "optional: specify the issuer url",

"ClaimRegex": "optional: regex to validate claims in the token",

"UserNameAttributeField": "optional: user",

"GroupAttributeField": "optional: group",

"SecretManagerArn": "arn:aws:secretsmanager:us-west-2:account

id:secret:/my-user-context-secret

}

], "UserContextPolicy": "USER_TOKEN"

}

You can override the default user and group ﬁeld names. The default value for

UserNameAttributeField is "user". The default value for GroupAttributeField is

"groups".

Next, call create-index using the input ﬁle. For example, if the name of your JSON ﬁle is

create-index-openid.json, you can use the following:

aws kendra create-index --cli-input-json file://create-index-openid.json

The secret must have the following format in Secrets Manager:

{

"keys": [

{

"alg": "RS256|RS384|RS512",

"kty": "RSA", //this can be RSA only for now

"use": "sig", //this value can be sig only for now

"n": "modulus of standard pem",

"e": "exponent of standard pem",

"kid": "key_id",

"x5t": "certificate thumprint for x.509 cert",

"x5c": [

"certificate chain"

]

}

]

}

For more information about JWT, see jwt.io.

Using a JSON Web Token (JWT) with a public key 205

Amazon Kendra Developer Guide

Python

response = kendra.create_index(

Name='user-context',

Edition='ENTERPRISE_EDITION',

RoleArn='arn:aws:iam::account id:role:/my-role',

UserTokenConfigurationList=[

{

"JwtTokenTypeConfiguration": {

"KeyLocation": "URL",

"Issuer": "optional: specify the issuer url",

"ClaimRegex": "optional: regex to validate claims in the token",

"UserNameAttributeField": "optional: user",

"GroupAttributeField": "optional: group",

"SecretManagerArn": "arn:aws:secretsmanager:us-west-2:account

id:secret:/my-user-context-secret"

}

],

UserContextPolicy='USER_TOKEN'

)

Using JSON

The following examples show how to use JSON for user access control when you create an index.

Warning

The JSON token is a non-validated payload. This should only be used when requests to

Amazon Kendra come from a trusted server and never from a browser.

Console

1. Choose Create index to start creating a new index.

2. On the Specify index details page, give your index a name and a description.

3. For IAM role, select a role or select Create a new role to and specify a role name to create a

new role. The IAM role will have the preﬁx "AmazonKendra-".

4. Leave all of the other ﬁelds at their defaults. Choose Next.

Using JSON 206

Amazon Kendra Developer Guide

5. In the Conﬁgure user access control page, under Access control settings, choose Yes to

use tokens for access control.

6. Under Token conﬁguration, select JSON as the Token type.

7. Specify a User name to use in the ACL check.

8. Specify one or more Groups to use in the ACL check.

9. Choose Next.

10. In the Provisioning details page, choose Developer edition.

11. Choose Create to create your index.

12. Wait for your index to be created. Amazon Kendra provisions the hardware for your index.

This operation can take some time.

CLI

To create an index with the AWS CLI using a JSON input ﬁle, ﬁrst create a JSON ﬁle with your

desired parameters:

{

"Name": "user-context",

"Edition": "ENTERPRISE_EDITION",

"RoleArn": "arn:aws:iam::account-id:role:/my-role",

"UserTokenConfigurations": [

{

"JsonTokenTypeConfiguration": {

"UserNameAttributeField": "user",

"GroupAttributeField": "group"

}

],

"UserContextPolicy": "USER_TOKEN"

}

Next, call create-index using the input ﬁle. For example, if the name of your JSON ﬁle is

create-index-openid.json, you can use the following:

aws kendra create-index --cli-input-json file://create-index-openid.json

If you are not using Open ID for AWS IAM Identity Center, you can send us the token in JSON

format. If you do, you must specify which ﬁeld in the JSON token contains the user name

Using JSON 207

Amazon Kendra Developer Guide

and which ﬁeld contains the groups. The group ﬁeld values must be a JSON string array. For

example, if you are using SAML, your token would be similar to the following:

{

"username" : "user1",

"groups": [

"group1",

"group2"

]

}

The TokenConfiguration would specify the user name and group ﬁeld names:

{

"UserNameAttributeField":"username",

"GroupAttributeField":"groups"

}

Python

response = kendra.create_index(

Name='user-context',

Edition='ENTERPRISE_EDITION',

RoleArn='arn:aws:iam::account-id:role:/my-role',

UserTokenConfigurations=[

{

"JwtTokenTypeConfiguration": {

"UserNameAttributeField": "user",

"GroupAttributeField": "group",

}

],

UserContextPolicy='USER_TOKEN'

)

Using JSON 208

Amazon Kendra Developer Guide

Creating a data source connector

You can create a data source connector for Amazon Kendra to connect to and index your

documents. Amazon Kendra can connect to Microsoft SharePoint, Google Drive, and many other

providers. When you create a data source connector, you give Amazon Kendra the conﬁguration

information required to connect to your source repository. Unlike adding documents directly to an

index, you can periodically scan the data source to update the index.

For example, say that you have a repository of tax documents stored in an Amazon S3 bucket. From

time to time, existing documents are changed and new documents are added to the repository. If

you add the repository to Amazon Kendra as a data source, you can keep your index up to date by

setting up periodic synchronizations between your data source and index.

You can choose to update an index manually using the console or the StartDataSourceSyncJob API.

Otherwise, you set up a schedule to update an index and have it synchronize with your data source.

An index can have more than one data source. Each data source can have its own update schedule.

For example, you might update the index of your working documents daily, or even hourly, while

updating your archived documents manually whenever the archive changes.

If you want to alter your document metadata or attributes and content during the document

ingestion process, see Amazon Kendra Custom Document Enrichment.

Note

Each document ID must be unique per index. You cannot create a data source to index

your documents with their unique IDs and then use the BatchPutDocument API to

index the same documents, or vice versa. You can delete a data source and then use

the BatchPutDocument API to index the same documents, or vice versa. Using the

BatchPutDocument and BatchDeleteDocument APIs in combination with an Amazon

Kendra data source connector for the same set of documents could cause inconsistencies

with your data. Instead, we recommend using the Amazon Kendra custom data source

connector.

209

Amazon Kendra Developer Guide

Note

Files added to the index must be in a UTF-8 encoded byte stream. For more information on

documents in Amazon Kendra, see Documents.

Setting an update schedule

Conﬁgure your data source to periodically update with the console or by using the Schedule

parameter when you create or update a data source. The content of the parameter is a string that

holds either a cron-format schedule string or an empty string to indicate that the index is updated

on demand. For the format of a cron expression, see Schedule Expressions for Rules in the Amazon

CloudWatch Events User Guide. Amazon Kendra supports only cron expressions. It doesn't support

rate expressions.

Setting a language

You can index all your documents in a data source in a supported language. You specify the

language code for all your documents in your data source when you call CreateDataSource. If a

document doesn't have a language code speciﬁed in a metadata ﬁeld, the document is indexed

using the language code that's speciﬁed for all documents at the data source level. If you don't

specify a language, Amazon Kendra indexes documents in a data source in English by default.

For more information on supported languages, including their codes, see Adding documents in

languages other than English.

You index all your documents in a data source in a supported language using the console. Go to

Data sources and edit your data source or Add data source if you're adding a new data source. On

the Specify data source details page, choose a language from the dropdown Language. You select

Update or continue to enter the conﬁguration information to connect to your data source.

Data source connectors

This section shows you how to connect Amazon Kendra to supported databases and data source

repositories using Amazon Kendra in the AWS Management Console and the Amazon Kendra APIs.

Topics

• Data source template schemas

Setting an update schedule 210

Amazon Kendra Developer Guide

• Adobe Experience Manager

• Alfresco

• Aurora (MySQL)

• Aurora (PostgreSQL)

• Amazon FSx (Windows)

• Amazon FSx (NetApp ONTAP)

• Amazon RDS/Aurora

• Amazon RDS (Microsoft SQL Server)

• Amazon RDS (MySQL)

• Amazon RDS (Oracle)

• Amazon RDS (PostgreSQL)

• Amazon S3

• Amazon Kendra Web Crawler

• Amazon WorkDocs

• Box

• Conﬂuence

• Custom data source connector

• Dropbox

• Drupal

• GitHub

• Gmail

• Google Drive

• IBM DB2

• Jira

• Microsoft Exchange

• Microsoft OneDrive

• Microsoft SharePoint

• Microsoft SQL Server

Data source connectors 211

Amazon Kendra Developer Guide

• Microsoft Teams

• Microsoft Yammer

• MySQL

• Oracle Database

• PostgreSQL

• Quip

• Salesforce

• ServiceNow

• Slack

• Zendesk

Data source template schemas

The following are template schemas for data sources where templates are supported.

Topics

• Adobe Experience Manager template schema

• Amazon FSx (Windows) template schema

• Amazon FSx (NetApp ONTAP) template schema

• Alfresco template schema

• Aurora (MySQL) template schema

• Aurora (PostgreSQL) template schema

• Amazon RDS (Microsoft SQL Server) template schema

• Amazon RDS (MySQL) template schema

• Amazon RDS (Oracle) template schema

• Amazon RDS (PostgreSQL) template schema

• Amazon S3 template schema

• Amazon Kendra Web Crawler template schema

• Conﬂuence template schema

• Dropbox template schema

Data source template schemas 212

Amazon Kendra Developer Guide

• Drupal template schema

• GitHub template schema

• Gmail template schema

• Google Drive template schema

• IBM DB2 template schema

• Microsoft Exchange template schema

• Microsoft OneDrive template schema

• Microsoft SharePoint template schema

• Microsoft SQL Server template schema

• Microsoft Teams template schema

• Microsoft Yammer template schema

• MySQL template schema

• Oracle Database template schema

• PostgreSQL template schema

• Salesforce template schema

• ServiceNow template schema

• Slack template schema

• Zendesk template schema

Adobe Experience Manager template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the Adobe Experience Manager host URL, the authentication type, and whether

you use Adobe Experience Manager (AEM) as a Cloud Service or AEM On-Premise as part of the

connection conﬁguration or repository endpoint details. Also, specify the type of data source as

AEM, a secret for your authentication credentials, and other necessary conﬁgurations. You then

specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. For more information, see Adobe

Experience Manager JSON schema.

The following table describes the parameters of the AEM JSON schema.

Data source template schemas 213

Amazon Kendra Developer Guide

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

aemUrl The Adobe Experience Manager host URL.

For example, if you use AEM On-Premise,

you include the hostname and port: https://

hostname:port. Or, if you use AEM as a Cloud

Service, you can use the author URL: https://a

uthor-xxxxxx-xxxxxxx.adobeaemcloud.com.

authType The type of authentication you use, whether

Basic or OAuth2.

deploymentType The type of Adobe Experience Manager that

you use, either CLOUD or ON_PREMISE .

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• page

• asset

A list of objects that map the attributes

or ﬁeld names of your Adobe Experience

Manager pages and assets to Amazon Kendra

index ﬁeld names. For more information, see

Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source.

timeZoneId If you use AEM On-Premise and the time zone

of your server is diﬀerent than the time zone

of the Amazon Kendra AEM connector or

index, you can specify the server time zone to

align with the AEM connector or index.

Data source template schemas 214

Amazon Kendra Developer Guide

Conﬁguration Description

The default time zone for AEM On-Premise

is the time zone of the Amazon Kendra AEM

connector or index. The default time zone for

AEM as a Cloud Service is Greenwich Mean

Time.

• pageRootPaths

• assetRootPaths

A list of root paths for pages and assets. For

example, the root path for a page could be

/content/sub and the root path for an asset

could be /content/sub/asset1.

crawlAssets

true to crawl assets.

crawlPages

true to crawl pages.

• pagePathInclusionPatterns

• pageNameInclusionPatterns

• assetPathInclusionPatterns

• assetTypeInclusionPatterns

• assetNameInclusionPatterns

A list of regular expression patterns to include

certain pages and assets in your Adobe

Experience Manager data source. Pages and

assets that match the patterns are included in

the index. Pages and assets that don't match

the patterns are excluded from the index. If a

page or asset matches both an inclusion and

exclusion pattern, the exclusion pattern takes

precedence, and the content isn't included in

the index.

• pagePathExclusionPatterns

• pageNameExclusionPatterns

• assetPathExclusionPatterns

• assetTypeInclusionPatterns

• assetNameInclusionPatterns

A list of regular expression patterns to exclude

certain pages and assets in your Adobe

Experience Manager data source. Pages and

assets that match the patterns are excluded

from the index. Pages and assets that don't

match the patterns are included in the index.

If a page or asset matches both an inclusion

and exclusion pattern, the exclusion pattern

takes precedence, and the content isn't

included in the index.

Data source template schemas 215

Amazon Kendra Developer Guide

Conﬁguration Description

pageComponents A list of names for the speciﬁc page

components that you want to index.

contentFragmentVariations A list of names for the speciﬁc saved variation

s of Adobe Experience Manager Content

Fragments that you want to index.

type

The type of data source. Specify AEM as your

data source type.

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

Data source template schemas 216

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

secretArn The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the key-

value pairs required to connect to your Adobe

Experience Manager. For information on these

key-value pairs, see Connection instructions

for Adobe Experience Manager.

version The version of this template that is currently

supported.

Adobe Experience Manager JSON schema

{

Data source template schemas 217

Amazon Kendra Developer Guide

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties":

{

"connectionConfiguration": {

"type": "object",

"properties":

{

"repositoryEndpointMetadata":

{

"type": "object",

"properties":

{

"aemUrl":

{

"type": "string",

"pattern": "https:.*"

},

"authType": {

"type": "string",

"enum": ["Basic", "OAuth2"]

},

"deploymentType": {

"type": "string",

"enum": ["CLOUD","ON_PREMISE"]

}

},

"required":

[

"aemUrl",

"authType",

"deploymentType"

]

}

},

"required":

[

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties":

{

Data source template schemas 218

Amazon Kendra Developer Guide

"page":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

Data source template schemas 219

Amazon Kendra Developer Guide

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"asset":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

Data source template schemas 220

Amazon Kendra Developer Guide

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

"properties":

{

"timeZoneId": {

"type": "string",

"enum": [

"Africa/Abidjan",

"Africa/Accra",

"Africa/Addis_Ababa",

"Africa/Algiers",

"Africa/Asmara",

"Africa/Asmera",

"Africa/Bamako",

"Africa/Bangui",

"Africa/Banjul",

"Africa/Bissau",

"Africa/Blantyre",

"Africa/Brazzaville",

"Africa/Bujumbura",

"Africa/Cairo",

"Africa/Casablanca",

Data source template schemas 221

Amazon Kendra Developer Guide

"Africa/Ceuta",

"Africa/Conakry",

"Africa/Dakar",

"Africa/Dar_es_Salaam",

"Africa/Djibouti",

"Africa/Douala",

"Africa/El_Aaiun",

"Africa/Freetown",

"Africa/Gaborone",

"Africa/Harare",

"Africa/Johannesburg",

"Africa/Juba",

"Africa/Kampala",

"Africa/Khartoum",

"Africa/Kigali",

"Africa/Kinshasa",

"Africa/Lagos",

"Africa/Libreville",

"Africa/Lome",

"Africa/Luanda",

"Africa/Lubumbashi",

"Africa/Lusaka",

"Africa/Malabo",

"Africa/Maputo",

"Africa/Maseru",

"Africa/Mbabane",

"Africa/Mogadishu",

"Africa/Monrovia",

"Africa/Nairobi",

"Africa/Ndjamena",

"Africa/Niamey",

"Africa/Nouakchott",

"Africa/Ouagadougou",

"Africa/Porto-Novo",

"Africa/Sao_Tome",

"Africa/Timbuktu",

"Africa/Tripoli",

"Africa/Tunis",

"Africa/Windhoek",

"America/Adak",

"America/Anchorage",

"America/Anguilla",

"America/Antigua",

"America/Araguaina",

Data source template schemas 222

Amazon Kendra Developer Guide

"America/Argentina/Buenos_Aires",

"America/Argentina/Catamarca",

"America/Argentina/ComodRivadavia",

"America/Argentina/Cordoba",

"America/Argentina/Jujuy",

"America/Argentina/La_Rioja",

"America/Argentina/Mendoza",

"America/Argentina/Rio_Gallegos",

"America/Argentina/Salta",

"America/Argentina/San_Juan",

"America/Argentina/San_Luis",

"America/Argentina/Tucuman",

"America/Argentina/Ushuaia",

"America/Aruba",

"America/Asuncion",

"America/Atikokan",

"America/Atka",

"America/Bahia",

"America/Bahia_Banderas",

"America/Barbados",

"America/Belem",

"America/Belize",

"America/Blanc-Sablon",

"America/Boa_Vista",

"America/Bogota",

"America/Boise",

"America/Buenos_Aires",

"America/Cambridge_Bay",

"America/Campo_Grande",

"America/Cancun",

"America/Caracas",

"America/Catamarca",

"America/Cayenne",

"America/Cayman",

"America/Chicago",

"America/Chihuahua",

"America/Ciudad_Juarez",

"America/Coral_Harbour",

"America/Cordoba",

"America/Costa_Rica",

"America/Creston",

"America/Cuiaba",

"America/Curacao",

"America/Danmarkshavn",

Data source template schemas 223

Amazon Kendra Developer Guide

"America/Dawson",

"America/Dawson_Creek",

"America/Denver",

"America/Detroit",

"America/Dominica",

"America/Edmonton",

"America/Eirunepe",

"America/El_Salvador",

"America/Ensenada",

"America/Fort_Nelson",

"America/Fort_Wayne",

"America/Fortaleza",

"America/Glace_Bay",

"America/Godthab",

"America/Goose_Bay",

"America/Grand_Turk",

"America/Grenada",

"America/Guadeloupe",

"America/Guatemala",

"America/Guayaquil",

"America/Guyana",

"America/Halifax",

"America/Havana",

"America/Hermosillo",

"America/Indiana/Indianapolis",

"America/Indiana/Knox",

"America/Indiana/Marengo",

"America/Indiana/Petersburg",

"America/Indiana/Tell_City",

"America/Indiana/Vevay",

"America/Indiana/Vincennes",

"America/Indiana/Winamac",

"America/Indianapolis",

"America/Inuvik",

"America/Iqaluit",

"America/Jamaica",

"America/Jujuy",

"America/Juneau",

"America/Kentucky/Louisville",

"America/Kentucky/Monticello",

"America/Knox_IN",

"America/Kralendijk",

"America/La_Paz",

"America/Lima",

Data source template schemas 224

Amazon Kendra Developer Guide

"America/Los_Angeles",

"America/Louisville",

"America/Lower_Princes",

"America/Maceio",

"America/Managua",

"America/Manaus",

"America/Marigot",

"America/Martinique",

"America/Matamoros",

"America/Mazatlan",

"America/Mendoza",

"America/Menominee",

"America/Merida",

"America/Metlakatla",

"America/Mexico_City",

"America/Miquelon",

"America/Moncton",

"America/Monterrey",

"America/Montevideo",

"America/Montreal",

"America/Montserrat",

"America/Nassau",

"America/New_York",

"America/Nipigon",

"America/Nome",

"America/Noronha",

"America/North_Dakota/Beulah",

"America/North_Dakota/Center",

"America/North_Dakota/New_Salem",

"America/Nuuk",

"America/Ojinaga",

"America/Panama",

"America/Pangnirtung",

"America/Paramaribo",

"America/Phoenix",

"America/Port-au-Prince",

"America/Port_of_Spain",

"America/Porto_Acre",

"America/Porto_Velho",

"America/Puerto_Rico",

"America/Punta_Arenas",

"America/Rainy_River",

"America/Rankin_Inlet",

"America/Recife",

Data source template schemas 225

Amazon Kendra Developer Guide

"America/Regina",

"America/Resolute",

"America/Rio_Branco",

"America/Rosario",

"America/Santa_Isabel",

"America/Santarem",

"America/Santiago",

"America/Santo_Domingo",

"America/Sao_Paulo",

"America/Scoresbysund",

"America/Shiprock",

"America/Sitka",

"America/St_Barthelemy",

"America/St_Johns",

"America/St_Kitts",

"America/St_Lucia",

"America/St_Thomas",

"America/St_Vincent",

"America/Swift_Current",

"America/Tegucigalpa",

"America/Thule",

"America/Thunder_Bay",

"America/Tijuana",

"America/Toronto",

"America/Tortola",

"America/Vancouver",

"America/Virgin",

"America/Whitehorse",

"America/Winnipeg",

"America/Yakutat",

"America/Yellowknife",

"Antarctica/Casey",

"Antarctica/Davis",

"Antarctica/DumontDUrville",

"Antarctica/Macquarie",

"Antarctica/Mawson",

"Antarctica/McMurdo",

"Antarctica/Palmer",

"Antarctica/Rothera",

"Antarctica/South_Pole",

"Antarctica/Syowa",

"Antarctica/Troll",

"Antarctica/Vostok",

"Arctic/Longyearbyen",

Data source template schemas 226

Amazon Kendra Developer Guide

"Asia/Aden",

"Asia/Almaty",

"Asia/Amman",

"Asia/Anadyr",

"Asia/Aqtau",

"Asia/Aqtobe",

"Asia/Ashgabat",

"Asia/Ashkhabad",

"Asia/Atyrau",

"Asia/Baghdad",

"Asia/Bahrain",

"Asia/Baku",

"Asia/Bangkok",

"Asia/Barnaul",

"Asia/Beirut",

"Asia/Bishkek",

"Asia/Brunei",

"Asia/Calcutta",

"Asia/Chita",

"Asia/Choibalsan",

"Asia/Chongqing",

"Asia/Chungking",

"Asia/Colombo",

"Asia/Dacca",

"Asia/Damascus",

"Asia/Dhaka",

"Asia/Dili",

"Asia/Dubai",

"Asia/Dushanbe",

"Asia/Famagusta",

"Asia/Gaza",

"Asia/Harbin",

"Asia/Hebron",

"Asia/Ho_Chi_Minh",

"Asia/Hong_Kong",

"Asia/Hovd",

"Asia/Irkutsk",

"Asia/Istanbul",

"Asia/Jakarta",

"Asia/Jayapura",

"Asia/Jerusalem",

"Asia/Kabul",

"Asia/Kamchatka",

"Asia/Karachi",

Data source template schemas 227

Amazon Kendra Developer Guide

"Asia/Kashgar",

"Asia/Kathmandu",

"Asia/Katmandu",

"Asia/Khandyga",

"Asia/Kolkata",

"Asia/Krasnoyarsk",

"Asia/Kuala_Lumpur",

"Asia/Kuching",

"Asia/Kuwait",

"Asia/Macao",

"Asia/Macau",

"Asia/Magadan",

"Asia/Makassar",

"Asia/Manila",

"Asia/Muscat",

"Asia/Nicosia",

"Asia/Novokuznetsk",

"Asia/Novosibirsk",

"Asia/Omsk",

"Asia/Oral",

"Asia/Phnom_Penh",

"Asia/Pontianak",

"Asia/Pyongyang",

"Asia/Qatar",

"Asia/Qostanay",

"Asia/Qyzylorda",

"Asia/Rangoon",

"Asia/Riyadh",

"Asia/Saigon",

"Asia/Sakhalin",

"Asia/Samarkand",

"Asia/Seoul",

"Asia/Shanghai",

"Asia/Singapore",

"Asia/Srednekolymsk",

"Asia/Taipei",

"Asia/Tashkent",

"Asia/Tbilisi",

"Asia/Tehran",

"Asia/Tel_Aviv",

"Asia/Thimbu",

"Asia/Thimphu",

"Asia/Tokyo",

"Asia/Tomsk",

Data source template schemas 228

Amazon Kendra Developer Guide

"Asia/Ujung_Pandang",

"Asia/Ulaanbaatar",

"Asia/Ulan_Bator",

"Asia/Urumqi",

"Asia/Ust-Nera",

"Asia/Vientiane",

"Asia/Vladivostok",

"Asia/Yakutsk",

"Asia/Yangon",

"Asia/Yekaterinburg",

"Asia/Yerevan",

"Atlantic/Azores",

"Atlantic/Bermuda",

"Atlantic/Canary",

"Atlantic/Cape_Verde",

"Atlantic/Faeroe",

"Atlantic/Faroe",

"Atlantic/Jan_Mayen",

"Atlantic/Madeira",

"Atlantic/Reykjavik",

"Atlantic/South_Georgia",

"Atlantic/St_Helena",

"Atlantic/Stanley",

"Australia/ACT",

"Australia/Adelaide",

"Australia/Brisbane",

"Australia/Broken_Hill",

"Australia/Canberra",

"Australia/Currie",

"Australia/Darwin",

"Australia/Eucla",

"Australia/Hobart",

"Australia/LHI",

"Australia/Lindeman",

"Australia/Lord_Howe",

"Australia/Melbourne",

"Australia/NSW",

"Australia/North",

"Australia/Perth",

"Australia/Queensland",

"Australia/South",

"Australia/Sydney",

"Australia/Tasmania",

"Australia/Victoria",

Data source template schemas 229

Amazon Kendra Developer Guide

"Australia/West",

"Australia/Yancowinna",

"Brazil/Acre",

"Brazil/DeNoronha",

"Brazil/East",

"Brazil/West",

"CET",

"CST6CDT",

"Canada/Atlantic",

"Canada/Central",

"Canada/Eastern",

"Canada/Mountain",

"Canada/Newfoundland",

"Canada/Pacific",

"Canada/Saskatchewan",

"Canada/Yukon",

"Chile/Continental",

"Chile/EasterIsland",

"Cuba",

"EET",

"EST5EDT",

"Egypt",

"Eire",

"Etc/GMT",

"Etc/GMT+0",

"Etc/GMT+1",

"Etc/GMT+10",

"Etc/GMT+11",

"Etc/GMT+12",

"Etc/GMT+2",

"Etc/GMT+3",

"Etc/GMT+4",

"Etc/GMT+5",

"Etc/GMT+6",

"Etc/GMT+7",

"Etc/GMT+8",

"Etc/GMT+9",

"Etc/GMT-0",

"Etc/GMT-1",

"Etc/GMT-10",

"Etc/GMT-11",

"Etc/GMT-12",

"Etc/GMT-13",

"Etc/GMT-14",

Data source template schemas 230

Amazon Kendra Developer Guide

"Etc/GMT-2",

"Etc/GMT-3",

"Etc/GMT-4",

"Etc/GMT-5",

"Etc/GMT-6",

"Etc/GMT-7",

"Etc/GMT-8",

"Etc/GMT-9",

"Etc/GMT0",

"Etc/Greenwich",

"Etc/UCT",

"Etc/UTC",

"Etc/Universal",

"Etc/Zulu",

"Europe/Amsterdam",

"Europe/Andorra",

"Europe/Astrakhan",

"Europe/Athens",

"Europe/Belfast",

"Europe/Belgrade",

"Europe/Berlin",

"Europe/Bratislava",

"Europe/Brussels",

"Europe/Bucharest",

"Europe/Budapest",

"Europe/Busingen",

"Europe/Chisinau",

"Europe/Copenhagen",

"Europe/Dublin",

"Europe/Gibraltar",

"Europe/Guernsey",

"Europe/Helsinki",

"Europe/Isle_of_Man",

"Europe/Istanbul",

"Europe/Jersey",

"Europe/Kaliningrad",

"Europe/Kiev",

"Europe/Kirov",

"Europe/Kyiv",

"Europe/Lisbon",

"Europe/Ljubljana",

"Europe/London",

"Europe/Luxembourg",

"Europe/Madrid",

Data source template schemas 231

Amazon Kendra Developer Guide

"Europe/Malta",

"Europe/Mariehamn",

"Europe/Minsk",

"Europe/Monaco",

"Europe/Moscow",

"Europe/Nicosia",

"Europe/Oslo",

"Europe/Paris",

"Europe/Podgorica",

"Europe/Prague",

"Europe/Riga",

"Europe/Rome",

"Europe/Samara",

"Europe/San_Marino",

"Europe/Sarajevo",

"Europe/Saratov",

"Europe/Simferopol",

"Europe/Skopje",

"Europe/Sofia",

"Europe/Stockholm",

"Europe/Tallinn",

"Europe/Tirane",

"Europe/Tiraspol",

"Europe/Ulyanovsk",

"Europe/Uzhgorod",

"Europe/Vaduz",

"Europe/Vatican",

"Europe/Vienna",

"Europe/Vilnius",

"Europe/Volgograd",

"Europe/Warsaw",

"Europe/Zagreb",

"Europe/Zaporozhye",

"Europe/Zurich",

"GB",

"GB-Eire",

"GMT",

"GMT0",

"Greenwich",

"Hongkong",

"Iceland",

"Indian/Antananarivo",

"Indian/Chagos",

"Indian/Christmas",

Data source template schemas 232

Amazon Kendra Developer Guide

"Indian/Cocos",

"Indian/Comoro",

"Indian/Kerguelen",

"Indian/Mahe",

"Indian/Maldives",

"Indian/Mauritius",

"Indian/Mayotte",

"Indian/Reunion",

"Iran",

"Israel",

"Jamaica",

"Japan",

"Kwajalein",

"Libya",

"MET",

"MST7MDT",

"Mexico/BajaNorte",

"Mexico/BajaSur",

"Mexico/General",

"NZ",

"NZ-CHAT",

"Navajo",

"PRC",

"PST8PDT",

"Pacific/Apia",

"Pacific/Auckland",

"Pacific/Bougainville",

"Pacific/Chatham",

"Pacific/Chuuk",

"Pacific/Easter",

"Pacific/Efate",

"Pacific/Enderbury",

"Pacific/Fakaofo",

"Pacific/Fiji",

"Pacific/Funafuti",

"Pacific/Galapagos",

"Pacific/Gambier",

"Pacific/Guadalcanal",

"Pacific/Guam",

"Pacific/Honolulu",

"Pacific/Johnston",

"Pacific/Kanton",

"Pacific/Kiritimati",

"Pacific/Kosrae",

Data source template schemas 233

Amazon Kendra Developer Guide

"Pacific/Kwajalein",

"Pacific/Majuro",

"Pacific/Marquesas",

"Pacific/Midway",

"Pacific/Nauru",

"Pacific/Niue",

"Pacific/Norfolk",

"Pacific/Noumea",

"Pacific/Pago_Pago",

"Pacific/Palau",

"Pacific/Pitcairn",

"Pacific/Pohnpei",

"Pacific/Ponape",

"Pacific/Port_Moresby",

"Pacific/Rarotonga",

"Pacific/Saipan",

"Pacific/Samoa",

"Pacific/Tahiti",

"Pacific/Tarawa",

"Pacific/Tongatapu",

"Pacific/Truk",

"Pacific/Wake",

"Pacific/Wallis",

"Pacific/Yap",

"Poland",

"Portugal",

"ROK",

"Singapore",

"SystemV/AST4",

"SystemV/AST4ADT",

"SystemV/CST6",

"SystemV/CST6CDT",

"SystemV/EST5",

"SystemV/EST5EDT",

"SystemV/HST10",

"SystemV/MST7",

"SystemV/MST7MDT",

"SystemV/PST8",

"SystemV/PST8PDT",

"SystemV/YST9",

"SystemV/YST9YDT",

"Turkey",

"UCT",

"US/Alaska",

Data source template schemas 234

Amazon Kendra Developer Guide

"US/Aleutian",

"US/Arizona",

"US/Central",

"US/East-Indiana",

"US/Eastern",

"US/Hawaii",

"US/Indiana-Starke",

"US/Michigan",

"US/Mountain",

"US/Pacific",

"US/Samoa",

"UTC",

"Universal",

"W-SU",

"WET",

"Zulu",

"EST",

"HST",

"MST",

"ACT",

"AET",

"AGT",

"ART",

"AST",

"BET",

"BST",

"CAT",

"CNT",

"CST",

"CTT",

"EAT",

"ECT",

"IET",

"IST",

"JST",

"MIT",

"NET",

"NST",

"PLT",

"PNT",

"PRT",

"PST",

"SST",

"VST"

Data source template schemas 235

Amazon Kendra Developer Guide

]

},

"pageRootPaths":

{

"type": "array",

"items":

{

"type": "string"

}

},

"assetRootPaths":

{

"type": "array",

"items":

{

"type": "string"

}

},

"crawlAssets":

{

"type": "boolean"

},

"crawlPages":

{

"type": "boolean"

},

"pagePathInclusionPatterns":

{

"type": "array",

"items":

{

"type": "string"

}

},

"pagePathExclusionPatterns":

{

"type": "array",

"items":

{

"type": "string"

}

},

"pageNameInclusionPatterns":

{

Data source template schemas 236

Amazon Kendra Developer Guide

"type": "array",

"items":

{

"type": "string"

}

},

"pageNameExclusionPatterns":

{

"type": "array",

"items":

{

"type": "string"

}

},

"assetPathInclusionPatterns":

{

"type": "array",

"items":

{

"type": "string"

}

},

"assetPathExclusionPatterns":

{

"type": "array",

"items":

{

"type": "string"

}

},

"assetTypeInclusionPatterns":

{

"type": "array",

"items":

{

"type": "string"

}

},

"assetTypeExclusionPatterns":

{

"type": "array",

"items":

{

"type": "string"

Data source template schemas 237

Amazon Kendra Developer Guide

}

},

"assetNameInclusionPatterns":

{

"type": "array",

"items":

{

"type": "string"

}

},

"assetNameExclusionPatterns":

{

"type": "array",

"items":

{

"type": "string"

}

},

"pageComponents": {

"type": "array",

"items": {

"type": "object"

}

},

"contentFragmentVariations": {

"type": "array",

"items": {

"type": "object"

}

},

"cugExemptedPrincipals": {

"type": "array",

"items": {

"type": "string"

}

},

"required":

[]

},

"type": {

"type": "string",

"pattern": "AEM"

},

Data source template schemas 238

Amazon Kendra Developer Guide

"enableIdentityCrawler": {

"type": "boolean"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Amazon FSx (Windows) template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the ﬁle system ID as part of the connection conﬁguration or repository

endpoint details. You must also specify the type of data source as FSX, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

Data source template schemas 239

Amazon Kendra Developer Guide

You can use the template provided in this developer guide. See Amazon FSx (Windows) JSON

schema.

The following table describes the parameters of the Amazon FSx (Windows) JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

ﬁleSystemId The identiﬁer of the Amazon FSx ﬁle system.

You can ﬁnd your ﬁle system ID on the File

Systems dashboard in the Amazon FSx

console.

ﬁleSystemType The Amazon FSx ﬁle system type. To use

Windows File Server as your type of ﬁle

system, specify WINDOWS.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

All A list of objects that map attributes or ﬁeld

names of your ﬁles in your Amazon FSx data

source to Amazon Kendra index ﬁeld names.

For more information, see Mapping data

source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source.

isCrawlAcl

true to crawl the access control list (ACL)

information for your documents, if you have

an ACL and want to use it for access control.

The ACL speciﬁes which documents that users

and groups can access. The ACL information is

Data source template schemas 240

Amazon Kendra Developer Guide

Conﬁguration Description

used to ﬁlter search results based on the user

or their group access to documents. For more

information, see User context ﬁltering.

inclusionPatterns A list of regular expression patterns to include

certain ﬁles in your Amazon FSx data source.

Files that match the patterns are included in

the index. Files that don't match the patterns

are excluded from the index. If a ﬁle matches

both an inclusion and exclusion pattern, the

exclusion pattern takes precedence and the

ﬁle isn't included in the index.

exclusionPatterns A list of regular expression patterns to exclude

certain ﬁles in your Amazon FSx data source.

Files that match the patterns are excluded

from the index. Files that don't match the

patterns are included in the index. If a ﬁle

matches both an exclusion and inclusion

pattern, the exclusion pattern takes precedenc

e and the ﬁle isn't included in the index.

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

Data source template schemas 241

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

type The type of data source. For Windows ﬁle

system data sources, specify FSX.

Amazon FSx (Windows) JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"fileSystemId": {

"type": "string",

"pattern": "fs-.*"

},

"fileSystemType": {

"type": "string",

"pattern": "WINDOWS"

Data source template schemas 242

Amazon Kendra Developer Guide

}

},

"required": ["fileSystemId", "fileSystemType"]

}

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"All": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": ["fieldMappings"]

}

Data source template schemas 243

Amazon Kendra Developer Guide

},

"required": ["All"]

},

"additionalProperties": {

"type": "object",

"properties": {

"isCrawlAcl": {

"type": "boolean"

},

"exclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"required": []

},

"enableIdentityCrawler": {

"type": "boolean"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL"

]

},

"type" : {

"type" : "string",

"pattern": "FSX"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

Data source template schemas 244

Amazon Kendra Developer Guide

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"enableIdentityCrawler",

"additionalProperties",

"type"

]

}

Amazon FSx (NetApp ONTAP) template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the ﬁle system ID and the storage virtual machine (SVM) as part of the

connection conﬁguration or repository endpoint details. You must also specify the type of

data source as FSXONTAP, a secret for your authentication credentials, and other necessary

conﬁgurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon FSx (NetApp ONTAP) JSON

schema.

The following table describes the parameters of the Amazon FSx (NetApp ONTAP) JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

ﬁleSystemId The identiﬁer of the Amazon FSx ﬁle system.

You can ﬁnd your ﬁle system ID on the File

Systems dashboard in the Amazon FSx

console. For information about how to create

a ﬁle system in the Amazon FSx console for

NetApp ONTAP, see Getting Started Guide

Data source template schemas 245

Amazon Kendra Developer Guide

Conﬁguration Description

for NetApp ONTAP in the FSx for ONTAP User

Guide.

ﬁleSystemType The Amazon FSx ﬁle system type. To use

NetApp ONTAP as your type of ﬁle system,

specify ONTAP.

svmId The identiﬁer of storage virtual machine (SVM)

used with your Amazon FSx ﬁle system for

NetApp ONTAP. You can ﬁnd your SVM ID

by going to the File Systems dashboard in

the Amazon FSx console, selecting your ﬁle

system ID, and then selecting Storage virtual

machines. For information about how to

create a ﬁle system in the Amazon FSx console

for NetApp ONTAP, see Getting Started Guide

for NetApp ONTAP in the FSx for ONTAP User

Guide.

protocolType Whether you use the Common Internet File

System (CIFS) protocol for Windows, or the

Network File System (NFS) protocol for Linux.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

ﬁle A list of objects that map attributes or ﬁeld

names of your ﬁles in your Amazon FSx data

source to Amazon Kendra index ﬁeld names.

For more information, see Mapping data

source ﬁelds. The data source ﬁeld names

must exist in your ﬁles custom metadata.

additionalProperties Additional conﬁguration options for your

content in your data source.

Data source template schemas 246

Amazon Kendra Developer Guide

Conﬁguration Description

crawlAcl

true to crawl the access control list (ACL)

information for your documents, if you have

an ACL and want to use it for access control.

The ACL speciﬁes which documents that users

and groups can access. The ACL information is

used to ﬁlter search results based on the user

or their group access to documents. For more

information, see User context ﬁltering.

inclusionPatterns A list of regular expression patterns to include

certain ﬁles in your Amazon FSx data source.

Files that match the patterns are included in

the index. Files that don't match the patterns

are excluded from the index. If a ﬁle matches

both an inclusion and exclusion pattern, the

exclusion pattern takes precedence and the

ﬁle isn't included in the index.

exclusionPatterns A list of regular expression patterns to exclude

certain ﬁles in your Amazon FSx data source.

Files that match the patterns are excluded

from the index. Files that don't match the

patterns are included in the index. If a ﬁle

matches both an exclusion and inclusion

pattern, the exclusion pattern takes precedenc

e and the ﬁle isn't included in the index.

type The type of data source. For NetApp ONTAP

ﬁle system data sources, specify FSXONTAP.

Data source template schemas 247

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 248

Amazon Kendra Developer Guide

Conﬁguration Description

secretArn The Amazon Resource Name (ARN) of an

AWS Secrets Manager secret that contains

the key-value pairs required to connect to

your Amazon FSx ﬁle system. The secret must

contain a JSON structure with the following

keys:

{

"username": " [email protected].

com ",

"password": " password"

}

If you use the NFS protocol for your Amazon

FSx ﬁle system, the secret is stored in a JSON

structure with the following keys:

{

"leftId": "left ID",

"rightId": " right ID",

"preSharedKey": " pre-shared key "

}

Amazon FSx (NetApp ONTAP) JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"fileSystemId": {

"type": "string",

Data source template schemas 249

Amazon Kendra Developer Guide

"pattern": "^(fs-[0-9a-f]{8,21})$"

},

"fileSystemType": {

"type": "string",

"enum": ["ONTAP"]

},

"svmId": {

"type": "string",

"pattern": "^(svm-[0-9a-f]{17,21})$"

},

"protocolType": {

"type": "string",

"enum": [

"CIFS",

"NFS"

]

}

},

"required": [

"fileSystemId",

"fileSystemType"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"file": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string",

"pattern": "^([a-zA-Z_]{1,20})$"

},

Data source template schemas 250

Amazon Kendra Developer Guide

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string",

"pattern": "^([a-zA-Z_]{1,20})$"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

],

"maxItems": 50

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

"file"

]

},

"additionalProperties": {

"type": "object",

"properties": {

"crawlAcl": {

"type": "boolean"

},

"inclusionPatterns": {

Data source template schemas 251

Amazon Kendra Developer Guide

"type": "array",

"items": {

"type": "string",

"maxLength": 30

},

"maxItems": 100

},

"exclusionPatterns": {

"type": "array",

"items": {

"type": "string",

"maxLength": 30

},

"maxItems": 100

}

},

"type": {

"type": "string",

"pattern": "FSXONTAP"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL"

]

},

"secretArn": {

"type": "string",

"pattern": "arn:aws:secretsmanager:.*"

}

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"additionalProperties",

"secretArn",

"type"

]

}

Data source template schemas 252

Amazon Kendra Developer Guide

Alfresco template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the Alfresco site ID, repository URL, user interface URL, authentication type,

whether you use cloud or on-premises, and the type of content you want to crawl. You provide

this as a part of the connection conﬁguration or repository endpoint details. Also specify the type

of data source as ALFRESCO, a secret for your authentication credentials, and other necessary

conﬁgurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Alfresco JSON schema.

The following table describes the parameters of the Alfresco JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

siteId The identiﬁer of the Alfresco site.

repoUrl The URL of your Alfresco repository. You can

get the repository URL from your Alfresco

administrator. For example, if you use Alfresco

Cloud (PaaS), the repository URL could be

https://company.alfrescocloud.com. Or, if you

use Alfresco On-Premises, the repository URL

could be https://company-alfresco-instance.co

mpany-domain.suﬃx:port.

webAppUrl The URL of your Alfresco user interface. You

can get the Alfresco user interface URL from

your Alfresco administrator. For example, the

user interface URL could be https://example.co

m.

repositoryAdditionalProperties Additional properties to connect with the

repository/data source endpoint.

Data source template schemas 253

Amazon Kendra Developer Guide

Conﬁguration Description

authType The type of authentication that you use,

whether OAuth2 or Basic.

type (deployment) The type of Alfresco that you use, whether

PAAS or ON-PREM.

crawlType The type of content that you want to crawl,

whether ASPECT (content marked with

'Aspects' in Alfresco), SITE_ID (content within

a speciﬁc Alfresco site), or ALL_SITES

(content across all your Alfresco sites).

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• document

• comment

A list of objects that map the attributes or

ﬁeld names of your Alfresco documents and

comments to Amazon Kendra index ﬁeld

names. For more information, see Mapping

data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source.

aspectName The name of a speciﬁc 'Aspect' that you want

to index.

aspectProperties A list of speciﬁc 'Aspect' content properties

that you want to index.

enableFineGrainedControl

true to crawl 'Aspects'.

isCrawlComment

true to crawl comments.

Data source template schemas 254

Amazon Kendra Developer Guide

Conﬁguration Description

• inclusionFileNamePatterns

• inclusionFileTypePatterns

• inclusionFilePathPatterns

A list of regular expression patterns to include

certain ﬁles in your Alfresco data source. Files

that match the patterns are included in the

index. Files that don't match the patterns are

excluded from the index. If a ﬁle matches

both an inclusion and exclusion pattern, the

exclusion pattern takes precedence, and the

ﬁle isn't included in the index.

• exclusionFileNamePatterns

• exclusionFileTypePatterns

• exclusionFilePathPatterns

A list of regular expression patterns to exclude

certain ﬁles in your Alfresco data source. Files

that match the patterns are excluded from the

index. Files that don't match the patterns are

included in the index. If a ﬁle matches both an

inclusion and exclusion pattern, the exclusion

pattern takes precedence, and the ﬁle isn't

included in the index.

type

The type of data source. Specify ALFRESCO as

your data source type.

Data source template schemas 255

Amazon Kendra Developer Guide

Conﬁguration Description

secretArn The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the key-

value pairs that are required to connect to

your Alfresco. The secret must contain a JSON

structure with the following keys:

If using basic authentication:

{

"username": " user name",

"password": " password"

}

If using OAuth 2.0 authentication:

{

"clientId": " client ID",

"clientSecret": " client secret",

"tokenUrl": " token URL"

}

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 256

Amazon Kendra Developer Guide

Conﬁguration Description

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

version The version of this template that is currently

supported.

Alfresco JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"siteId": {

"type": "string"

},

"repoUrl": {

"type": "string"

},

"webAppUrl": {

"type": "string"

},

"repositoryAdditionalProperties": {

"type": "object",

"properties": {

"authType": {

Data source template schemas 257

Amazon Kendra Developer Guide

"type": "string",

"enum": [

"OAuth2",

"Basic"

]

},

"type": {

"type": "string",

"enum": [

"PAAS",

"ON_PREM"

]

},

"crawlType": {

"type": "string",

"enum": [

"ASPECT",

"SITE_ID",

"ALL_SITES"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

Data source template schemas 258

Amazon Kendra Developer Guide

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"STRING_LIST",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"comment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

Data source template schemas 259

Amazon Kendra Developer Guide

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"STRING_LIST",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

"properties": {

"aspectName": {

"type": "string"

},

"aspectProperties": {

"type": "array"

},

Data source template schemas 260

Amazon Kendra Developer Guide

"enableFineGrainedControl": {

"type": "boolean"

},

"isCrawlComment": {

"type": "boolean"

},

"inclusionFileNamePatterns": {

"type": "array"

},

"exclusionFileNamePatterns": {

"type": "array"

},

"inclusionFileTypePatterns": {

"type": "array"

},

"exclusionFileTypePatterns": {

"type": "array"

},

"inclusionFilePathPatterns": {

"type": "array"

},

"exclusionFilePathPatterns": {

"type": "array"

}

},

"type": {

"type": "string",

"pattern": "ALFRESCO"

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL"

]

},

"enableIdentityCrawler": {

"type": "boolean"

Data source template schemas 261

Amazon Kendra Developer Guide

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

}

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"additionalProperties",

"type",

"secretArn"

]

}

Aurora (MySQL) template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as mysql, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Aurora (MySQL) JSON schema.

The following table describes the parameters of the Aurora (MySQL) JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

Data source template schemas 262

Amazon Kendra Developer Guide

Conﬁguration Description

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

titleColumn Provide the name of the document title

column within your database table.

bodyColumn Provide the name of the document title

column within your database table.

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

Data source template schemas 263

Amazon Kendra Developer Guide

Conﬁguration Description

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

type

The type of data source. Specify JDBC as your

data source type.

Data source template schemas 264

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

version The version of the template that is currently

supported.

Data source template schemas 265

Amazon Kendra Developer Guide

Aurora (MySQL) JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

Data source template schemas 266

Amazon Kendra Developer Guide

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

Data source template schemas 267

Amazon Kendra Developer Guide

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

Data source template schemas 268

Amazon Kendra Developer Guide

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Aurora (PostgreSQL) template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as postgresql, a secret for

your authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as

the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Aurora (PostgreSQL) JSON schema.

The following table describes the parameters of the Aurora (PostgreSQL) JSON schema.

Data source template schemas 269

Amazon Kendra Developer Guide

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

titleColumn Provide the name of the document title

column within your database table.

Data source template schemas 270

Amazon Kendra Developer Guide

Conﬁguration Description

bodyColumn Provide the name of the document title

column within your database table.

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

Data source template schemas 271

Amazon Kendra Developer Guide

Conﬁguration Description

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

type

The type of data source. Specify JDBC as your

data source type.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 272

Amazon Kendra Developer Guide

Conﬁguration Description

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

version The version of the template that is currently

supported.

Aurora (PostgreSQL) JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

Data source template schemas 273

Amazon Kendra Developer Guide

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

Data source template schemas 274

Amazon Kendra Developer Guide

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

Data source template schemas 275

Amazon Kendra Developer Guide

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

Data source template schemas 276

Amazon Kendra Developer Guide

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Amazon RDS (Microsoft SQL Server) template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as sqlserver, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon RDS (Microsoft SQL

Server) JSON schema.

The following table describes the parameters of the Amazon RDS (Microsoft SQL Server) JSON

schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

Data source template schemas 277

Amazon Kendra Developer Guide

Conﬁguration Description

speciﬁc types of content and ﬁeld mappings.

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

titleColumn Provide the name of the document title

column within your database table.

bodyColumn Provide the name of the document title

column within your database table.

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

Data source template schemas 278

Amazon Kendra Developer Guide

Conﬁguration Description

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

type

The type of data source. Specify JDBC as your

data source type.

Data source template schemas 279

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

version The version of the template that is currently

supported.

Data source template schemas 280

Amazon Kendra Developer Guide

Amazon RDS (Microsoft SQL Server) JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

Data source template schemas 281

Amazon Kendra Developer Guide

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

Data source template schemas 282

Amazon Kendra Developer Guide

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

Data source template schemas 283

Amazon Kendra Developer Guide

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Amazon RDS (MySQL) template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as mysql, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon RDS (MySQL) JSON

schema.

Data source template schemas 284

Amazon Kendra Developer Guide

The following table describes the parameters of the Amazon RDS (MySQL) JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

Data source template schemas 285

Amazon Kendra Developer Guide

Conﬁguration Description

titleColumn Provide the name of the document title

column within your database table.

bodyColumn Provide the name of the document title

column within your database table.

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

Data source template schemas 286

Amazon Kendra Developer Guide

Conﬁguration Description

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

type

The type of data source. Specify JDBC as your

data source type.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 287

Amazon Kendra Developer Guide

Conﬁguration Description

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

version The version of the template that is currently

supported.

Amazon RDS (MySQL) JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

Data source template schemas 288

Amazon Kendra Developer Guide

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

Data source template schemas 289

Amazon Kendra Developer Guide

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

Data source template schemas 290

Amazon Kendra Developer Guide

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

Data source template schemas 291

Amazon Kendra Developer Guide

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Amazon RDS (Oracle) template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as oracle, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon RDS (Oracle) JSON

schema.

The following table describes the parameters of the Amazon RDS (Oracle) JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Data source template schemas 292

Amazon Kendra Developer Guide

Conﬁguration Description

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

titleColumn Provide the name of the document title

column within your database table.

bodyColumn Provide the name of the document title

column within your database table.

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

Data source template schemas 293

Amazon Kendra Developer Guide

Conﬁguration Description

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

type

The type of data source. Specify JDBC as your

data source type.

Data source template schemas 294

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

version The version of the template that is currently

supported.

Data source template schemas 295

Amazon Kendra Developer Guide

Amazon RDS (Oracle) JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

Data source template schemas 296

Amazon Kendra Developer Guide

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

Data source template schemas 297

Amazon Kendra Developer Guide

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

Data source template schemas 298

Amazon Kendra Developer Guide

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Amazon RDS (PostgreSQL) template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as postgresql, a secret for

your authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as

the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Amazon RDS (PostgreSQL) JSON

schema.

Data source template schemas 299

Amazon Kendra Developer Guide

The following table describes the parameters of the Amazon RDS (PostgreSQL) JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

Data source template schemas 300

Amazon Kendra Developer Guide

Conﬁguration Description

titleColumn Provide the name of the document title

column within your database table.

bodyColumn Provide the name of the document title

column within your database table.

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

Data source template schemas 301

Amazon Kendra Developer Guide

Conﬁguration Description

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

type

The type of data source. Specify JDBC as your

data source type.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 302

Amazon Kendra Developer Guide

Conﬁguration Description

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

version The version of the template that is currently

supported.

Amazon RDS (PostgreSQL) JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

Data source template schemas 303

Amazon Kendra Developer Guide

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

Data source template schemas 304

Amazon Kendra Developer Guide

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

Data source template schemas 305

Amazon Kendra Developer Guide

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

Data source template schemas 306

Amazon Kendra Developer Guide

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Amazon S3 template schema

You include a JSON that contains the data source schema as part of the template conﬁguration.

You provide the name of the S3 bucket as a part of the connection conﬁguration or repository

endpoint details. Also specify the type of data source as S3, and other necessary conﬁgurations.

You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See S3 JSON schema.

The following table describes the parameters of the Amazon S3 JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

BucketName The name of your Amazon S3 bucket.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

additionalProperties Additional conﬁguration options for your

content in your data source

• inclusionPatterns

• exclusionPatterns

• inclusionPreﬁxes

• exclusionPreﬁxes

A list of regular expression patterns to include

or exclude speciﬁc ﬁles in your Amazon S3

data source. Files that match the patterns are

included in the index. Files that don't match

the patterns are excluded from the index. If a

Data source template schemas 307

Amazon Kendra Developer Guide

Conﬁguration Description

ﬁle matches both an inclusion and exclusion

pattern, the exclusion pattern takes precedenc

e and the ﬁle isn't included in the index.

aclConﬁgurationFilePath The ﬁle path that controls access to

documents in an Amazon Kendra index.

metadataFilesPreﬁx The location within your bucket for metadata

ﬁles.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

type

The type of data source. Specify S3 as your

data source type.

version The version of the template that is supported.

S3 JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

Data source template schemas 308

Amazon Kendra Developer Guide

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"BucketName": {

"type": "string"

}

},

"required": [

"BucketName"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING"

]

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

Data source template schemas 309

Amazon Kendra Developer Guide

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

"document"

]

},

"additionalProperties": {

"type": "object",

"properties": {

"inclusionPatterns": {

"type": "array"

},

"exclusionPatterns": {

"type": "array"

},

"inclusionPrefixes": {

"type": "array"

},

"exclusionPrefixes": {

"type": "array"

},

"aclConfigurationFilePath": {

"type": "string"

},

"metadataFilesPrefix": {

"type": "string"

}

},

"syncMode": {

"type": "string",

"enum": [

"FULL_CRAWL",

Data source template schemas 310

Amazon Kendra Developer Guide

"FORCED_FULL_CRAWL"

]

},

"type": {

"type": "string",

"pattern": "S3"

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

}

},

"required": [

"connectionConfiguration",

"type",

"syncMode",

"repositoryConfigurations"

]

}

Amazon Kendra Web Crawler template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object.

You provide the seed or starting point URLs, or you can provide the sitemap URLs, as part of the

connection conﬁguration or repository endpoint details. Instead of manually listing all your URLs,

you can provide the path to the Amazon S3 bucket that stores a text ﬁle for your list of seed URLs

or sitemap XML ﬁles, which you can club together in a ZIP ﬁle in S3.

You also specify the type of data source as WEBCRAWLERV2, the website authentication

credentials and authentication type if your websites require authentication, and other necessary

conﬁgurations.

You then specify TEMPLATE as the Type when you call CreateDataSource.

Data source template schemas 311

Amazon Kendra Developer Guide

Important

Web Crawler v2.0 connector creation is not supported by AWS CloudFormation. Use the

Web Crawler v1.0 connector if you need AWS CloudFormation support.

When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other

Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own

web pages, or web pages that you have authorization to index. To learn how to stop Amazon Kendra

Web Crawler from indexing your websites, see Conﬁguring the robots.txt ﬁle for Amazon Kendra

Web Crawler.

You can use the template provided in this developer guide. See Amazon Kendra Web Crawler JSON

schema.

The following table describes the parameters of the Amazon Kendra Web Crawler JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

siteMapUrls The list of sitemap URLs for the websites that

you want to crawl. You can list up to three

sitemap URLs.

s3SeedUrl The S3 path to the text ﬁle that stores the list

of seed or starting point URLs. For example,

s3://bucket-name/directory/. Each URL in the

text ﬁle must be formatted on a separate line.

You can list up to 100 seed URLs in a ﬁle.

s3SiteMapUrl The S3 path to the sitemap XML ﬁles. For

example, s3://bucket-name/directory/. You

can list up to three sitemap XML ﬁles. You can

club together multiple sitemap ﬁles into a ZIP

Data source template schemas 312

Amazon Kendra Developer Guide

Conﬁguration Description

ﬁle and store the ZIP ﬁle in your Amazon S3

bucket.

seedUrlConnections The list of seed or starting point URLs for the

websites that you want to crawl.You can list up

to 100 seed URLs.

seedUrl The seed or starting point URL.

authentication The authentication type if your websites

require the same authentication, otherwise

specify NoAuthentication .

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• webPage

• attachment

A list of objects that map the attributes or

ﬁeld names of your web pages and web page

ﬁles to Amazon Kendra index ﬁeld names. For

example, the HTML web page title tag can be

mapped to the _document_title index

ﬁeld. For more information, see Mapping data

source ﬁelds.

Data source template schemas 313

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

additionalProperties Additional conﬁguration options for your

content in your data source.

rateLimit The maximum number of URLs crawled per

website host per minute.

maxFileSize The maximum size (in MB) of a web page or

attachment to crawl.

crawlDepth The number of levels from the seed URL to

crawl. For example, the seed URL page is

depth 1 and any hyperlinks on this page that

are also crawled are depth 2.

maxLinksPerUrl The maximum number of URLs on a web

page to include when crawling a website.

This number is per web page. As a website's

web pages are crawled, any URLs that the

webpages link to also are crawled. URLs on a

web page are crawled in order of appearance.

Data source template schemas 314

Amazon Kendra Developer Guide

Conﬁguration Description

crawlSubDomain

true to crawl the website domains with

subdomains. For example, if the seed URL is

"abc.example.com", then "a.abc.example.com

" and "b.abc.example.com" are also crawled.

If you don't set crawlSubDomain or

crawlAllDomain to true, then Amazon

Kendra only crawls the domains of the

websites that you want to crawl.

crawlAllDomain

true to crawl the website domains with

subdomains and other domains the web

pages link to. If you don't set crawlSubD

omain or crawlAllDomain to true, then

Amazon Kendra only crawls the domains of

the websites that you want to crawl.

honorRobots

true to respect the robots.txt directives of

the websites that you want to crawl. These

directives control how Amazon Kendra Web

Crawler crawls the websites, whether Amazon

Kendra can crawl only speciﬁc content or not

crawl any content.

crawlAttachments

true to crawl ﬁles that the web pages link to.

• inclusionURLCrawlPatterns

• inclusionURLIndexPatterns

A list of regular expression patterns to include

crawling certain URLs and indexing any

hyperlinks on these URL web pages. URLs

that match the patterns are included in the

index. URLs that don't match the patterns are

excluded from the index. If a URL matches

both an inclusion and exclusion pattern, the

exclusion pattern takes precedence, and the

URL/website's web pages aren't included in

the index.

Data source template schemas 315

Amazon Kendra Developer Guide

Conﬁguration Description

• exclusionURLCrawlPatterns

• exclusionURLIndexPatterns

A list of regular expression patterns to exclude

crawling certain URLs and indexing any

hyperlinks on these URL web pages. URLs

that match the patterns are excluded from

the index. URLs that don't match the patterns

are included in the index. If a URL matches

both an inclusion and exclusion pattern, the

exclusion pattern takes precedence, and the

URL/website's web pages aren't included in

the index.

inclusionFileIndexPatterns A list of regular expression patterns to include

certain web page ﬁles. Files that match the

patterns are included in the index. Files that

don't match the patterns are excluded from

the index. If a ﬁle matches both an inclusion

and exclusion pattern, the exclusion pattern

takes precedence, and the ﬁle isn't included in

the index.

exclusionFileIndexPatterns A list of regular expression patterns to exclude

certain web page ﬁles. Files that match the

patterns are excluded from the index. Files

that don't match the patterns are included in

the index. If a ﬁle matches both an inclusion

and exclusion pattern, the exclusion pattern

takes precedence, and the ﬁle isn't included in

the index.

proxy Conﬁguration information required to connect

to your internal websites via a web proxy.

Data source template schemas 316

Amazon Kendra Developer Guide

Conﬁguration Description

host The host name of the proxy sever you want

to use to connect to internal websites. For

example, the host name of https://a.example.

com/page1.html is "a.example.com".

port The port number of the proxy sever you want

to use to connect to internal websites. For

example, 443 is the standard port for HTTPS.

secretArn (proxy) If web proxy credentials are required to

connect to a website host, you can create an

AWS Secrets Manager secret that stores the

credentials. Provide the Amazon Resource

Name (ARN) of the secret.

type

The type of data source. Specify WEBCRAWLE

RV2 as your data source type.

Data source template schemas 317

Amazon Kendra Developer Guide

Conﬁguration Description

secretArn The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that's used if your

websites require authentication to access

the websites. You store the authentication

credentials for the website in the secret that

contains JSON key-value pairs.

If you use basic, or NTML/Kerberos, enter

the user name and password. The JSON

keys in the secret must be userName and

password. NTLM authentication protocol

includes password hashing, and Kerberos

authentication protocol includes password

encryption.

If you use SAML or form authentication, enter

the user name and password, XPath for the

user name ﬁeld (and user name button if

using SAML), XPaths for the password ﬁeld

and button, and the login page URL. The

JSON keys in the secret must be userName,

password, userNameFieldXpath ,

userNameButtonXpath , passwordF

ieldXpath , passwordButtonXpath ,

and loginPageUrl . You can ﬁnd the XPaths

(XML Path Language) of elements using your

web browser's developer tools. XPaths usually

follow this format: //tagname[@Attribu

te='Value'] .

Amazon Kendra also checks if the endpoint

information (seed URLs) included in the secret

is the same the endpoint information speciﬁed

in your data source endpoint conﬁguration

details.

Data source template schemas 318

Amazon Kendra Developer Guide

Conﬁguration Description

version The version of this template that is currently

supported.

Amazon Kendra Web Crawler JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"siteMapUrls": {

"type": "array",

"items":{

"type": "string",

"pattern": "https://.*"

}

},

"s3SeedUrl": {

"type": "string",

"pattern": "s3:.*"

},

"s3SiteMapUrl": {

"type": "string",

"pattern": "s3:.*"

},

"seedUrlConnections": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"seedUrl":{

"type": "string",

"pattern": "https://.*"

}

Data source template schemas 319

Amazon Kendra Developer Guide

},

"required": [

"seedUrl"

]

}

]

},

"authentication": {

"type": "string",

"enum": [

"NoAuthentication",

"BasicAuth",

"NTLM_Kerberos",

"Form",

"SAML"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"webPage": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

Data source template schemas 320

Amazon Kendra Developer Guide

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"attachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"LONG"

]

},

Data source template schemas 321

Amazon Kendra Developer Guide

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL"

]

},

"additionalProperties": {

"type": "object",

"properties": {

"rateLimit": {

"type": "string",

"default": "300"

},

"maxFileSize": {

"type": "string",

"default": "50"

},

"crawlDepth": {

"type": "string",

"default": "2"

Data source template schemas 322

Amazon Kendra Developer Guide

},

"maxLinksPerUrl": {

"type": "string",

"default": "100"

},

"crawlSubDomain": {

"type": "boolean",

"default": false

},

"crawlAllDomain": {

"type": "boolean",

"default": false

},

"honorRobots": {

"type": "boolean",

"default": false

},

"crawlAttachments": {

"type": "boolean",

"default": false

},

"inclusionURLCrawlPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionURLCrawlPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionURLIndexPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionURLIndexPatterns": {

"type": "array",

"items": {

"type": "string"

}

Data source template schemas 323

Amazon Kendra Developer Guide

},

"inclusionFileIndexPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileIndexPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"proxy": {

"type": "object",

"properties": {

"host": {

"type": "string"

},

"port": {

"type": "string"

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"required": [

"rateLimit",

"maxFileSize",

"crawlDepth",

"crawlSubDomain",

"crawlAllDomain",

"maxLinksPerUrl",

"honorRobots"

]

},

"type": {

"type": "string",

"pattern": "WEBCRAWLERV2"

},

Data source template schemas 324

Amazon Kendra Developer Guide

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"type",

"additionalProperties"

]

}

Conﬂuence template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the Conﬂuence host URL, the hosting method, and the authentication type

as a part of the connection conﬁguration or repository endpoint details. Also specify the type of

data source as CONFLUENCEV2, a secret for your authentication credentials, and other necessary

conﬁgurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Conﬂuence JSON schema.

The following table describes the parameters of the Conﬂuence JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

Data source template schemas 325

Amazon Kendra Developer Guide

Conﬁguration Description

hostUrl The URL for your Conﬂuence instance.

For example, https://example.co

nfluence.com .

type The hosting method for your Conﬂuence

instance, whether SAAS and ON_PREM.

authType The authentication method for your Conﬂuenc

e instance, whether Basic, OAuth2, or

Personal-token .

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• space

• page

• blog

• comment

• attachment

A list of objects that map the attributes

or ﬁeld names of your Conﬂuence spaces,

pages, blogs, comments, and attachments to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

The Conﬂuence data source ﬁeld names must

exist in your Conﬂuence custom metadata.

additionalProperties Additional conﬁguration options for your

content in your data source.

isCrawlAcl

true to crawl the access control list (ACL)

information for your documents, if you have

an ACL and want to use it for access control.

The ACL speciﬁes which documents that users

and groups can access. The ACL information is

used to ﬁlter search results based on the user

or their group access to documents. For more

information, see User context ﬁltering.

Data source template schemas 326

Amazon Kendra Developer Guide

Conﬁguration Description

ﬁeldForUserId

Specify email if you want to use the user

email for the user ID. email is used by default

and is currently the only supported user ID

type.

• inclusionSpaceKeyFilter

• exclusionSpaceKeyFilter

• pageTitleRegEX

• blogTitleRegEX

• commentTitleRegEX

• attachmentTitleRegEX

• inclusionFileTypePatterns

• exclusionFileTypePatterns

• inclusionUrlPatterns

• exclusionUrlPatterns

A list of regular expression patterns to include

and/or exclude certain ﬁles in your Conﬂuence

data source. Files that match the patterns are

included in the index. Files that don't match

the patterns are excluded from the index. If a

ﬁle matches both an inclusion and exclusion

pattern, the exclusion pattern takes precedenc

e and the ﬁle isn't included in the index.

proxyHost The host name of the web proxy that you use,

without the http:// or https:// protocol.

proxyPort The port number used by the host URL

transport protocol. Must be a numeric value

between 0 and 65535.

• isCrawlPersonalSpace

• isCrawlArchivedSpace

• isCrawlArchivedPage

• isCrawlPage

• isCrawlBlog

• isCrawlPageComment

• isCrawlPageAttachment

• isCrawlBlogComment

• isCrawlBlogAttachment

true to crawl ﬁles in your Conﬂuenc

e personal spaces, pages, blogs, page

comments, page attachments, blog

comments, and blog attachments.

Data source template schemas 327

Amazon Kendra Developer Guide

Conﬁguration Description

maxFileSizeInMegaBytes Specify the ﬁle size limit in MBs that Amazon

Kendra can crawl. Amazon Kendra crawls only

the ﬁles within the size limit you deﬁne. The

default ﬁle size is 50MB. The maximum ﬁle

size should be greater than 0MB and less than

or equal to 50MB.

type

The type of data source. Specify CONFLUENC

EV2 as your data source type.

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 328

Amazon Kendra Developer Guide

Conﬁguration Description

secretARN The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the

key-value pairs required to connect to your

Conﬂuence. For information on these key-

value pairs, see Connection instructions for

Conﬂuence.

version The version of this template that is currently

supported.

Conﬂuence JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"hostUrl": {

"type": "string",

"pattern": "https:.*"

},

"type": {

"type": "string",

"enum": [

"SAAS",

"ON_PREM"

]

},

"authType": {

"type": "string",

"enum": [

"Basic",

"OAuth2",

"Personal-token"

Data source template schemas 329

Amazon Kendra Developer Guide

]

}

},

"required": [

"hostUrl",

"type",

"authType"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"space": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

Data source template schemas 330

Amazon Kendra Developer Guide

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"page": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

Data source template schemas 331

Amazon Kendra Developer Guide

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"blog": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

Data source template schemas 332

Amazon Kendra Developer Guide

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"comment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

Data source template schemas 333

Amazon Kendra Developer Guide

}

]

}

},

"required": [

"fieldMappings"

]

},

"attachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

Data source template schemas 334

Amazon Kendra Developer Guide

}

},

"required": [

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

"properties": {

"usersAclS3FilePath": {

"type": "string"

},

"isCrawlAcl": {

"type": "boolean"

},

"fieldForUserId": {

"type": "string"

},

"inclusionSpaceKeyFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionSpaceKeyFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"pageTitleRegEX": {

"type": "array",

"items": {

"type": "string"

}

},

"blogTitleRegEX": {

"type": "array",

"items": {

"type": "string"

}

},

Data source template schemas 335

Amazon Kendra Developer Guide

"commentTitleRegEX": {

"type": "array",

"items": {

"type": "string"

}

},

"attachmentTitleRegEX": {

"type": "array",

"items": {

"type": "string"

}

},

"isCrawlPersonalSpace": {

"type": "boolean"

},

"isCrawlArchivedSpace": {

"type": "boolean"

},

"isCrawlArchivedPage": {

"type": "boolean"

},

"isCrawlPage": {

"type": "boolean"

},

"isCrawlBlog": {

"type": "boolean"

},

"isCrawlPageComment": {

"type": "boolean"

},

"isCrawlPageAttachment": {

"type": "boolean"

},

"isCrawlBlogComment": {

"type": "boolean"

},

"isCrawlBlogAttachment": {

"type": "boolean"

},

"maxFileSizeInMegaBytes": {

"type":"string"

},

"inclusionFileTypePatterns": {

"type": "array",

Data source template schemas 336

Amazon Kendra Developer Guide

"items": {

"type": "string"

}

},

"exclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionUrlPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionUrlPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"proxyHost": {

"type": "string"

},

"proxyPort": {

"type": "string"

}

},

"required": []

},

"type": {

"type": "string",

"pattern": "CONFLUENCEV2"

},

"enableIdentityCrawler": {

"type": "boolean"

},

"syncMode": {

"type": "string",

"enum": [

"FULL_CRAWL",

"FORCED_FULL_CRAWL"

]

Data source template schemas 337

Amazon Kendra Developer Guide

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Dropbox template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the Dropbox app key, app secret, and access token as part of your secret that

stores your authentication credentials. Also specify the type of data source as DROPBOX, the type of

access token you want to use (temporary or permanent), and other necessary conﬁgurations. You

then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Dropbox JSON schema.

The following table describes the parameters of the Dropbox JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

Data source template schemas 338

Amazon Kendra Developer Guide

Conﬁguration Description

repositoryEndpointMetadata The endpoint information for the data source.

This data source does not specify an endpoint

in repositoryEndpointMetadata .

Rather, the connection information is included

in an AWS Secrets Manager secret that you

provide the secretArn .

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• ﬁle

• paper

• papert

• shortcut

A list of objects that map the attributes or

ﬁeld names of your Dropbox ﬁles, Dropbox

Paper, and shortcuts to Amazon Kendra

index ﬁeld names. For more information, see

Mapping data source ﬁelds.

Data source template schemas 339

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

Data source template schemas 340

Amazon Kendra Developer Guide

Conﬁguration Description

secretARN The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the

key-value pairs required to connect to your

Dropbox. The secret must contain a JSON

structure with the following keys:

{

"appKey": "Dropbox app key",

"appSecret": " Dropbox app secret",

"accesstoken": " temporary access

token or refresh access token"

}

additionalProperties Additional conﬁguration options for your

content in your data source.

isCrawlAcl

true to crawl the access control list (ACL)

information for your documents, if you have

an ACL and want to use it for access control.

The ACL speciﬁes which documents that users

and groups can access. The ACL information is

used to ﬁlter search results based on the user

or their group access to documents. For more

information, see User context ﬁltering.

• inclusionFileNamePatterns

• inclusionFileTypePatterns

A list of regular expression patterns to include

certain ﬁle names and types in your Dropbox

data source. Files that match the patterns are

included in the index. Files that don't match

the patterns are excluded from the index. If a

ﬁle matches both an inclusion and exclusion

pattern, the exclusion pattern takes precedenc

e and the ﬁle isn't included in the index.

Data source template schemas 341

Amazon Kendra Developer Guide

Conﬁguration Description

• exclusionFileNamePatterns

• exclusionFileTypePatterns

A list of regular expression patterns to exclude

certain ﬁle names and types in your Dropbox

data source. Files that match the patterns

are excluded from the index. Files that don't

match the patterns are included in the index.

If a ﬁle matches both an exclusion and

inclusion pattern, the exclusion pattern takes

precedence and the ﬁle isn't included in the

index.

• crawlFile

• crawlPaper

• crawlPapert

• crawlShortcut

true to crawl ﬁles in your Dropbox, Dropbox

Paper documents, Dropbox Paper templates

, and web page shortcuts stored in your

Dropbox.

type

The type of data source. Specify DROPBOX as

your data source type.

tokenType Specify your access token type: permanent

or temporary access token. It's recommend

ed that you create a refresh access token that

never expires in Dropbox rather that relying on

a one-time access token that expires after 4

hours. You create an app and a refresh access

token in the Dropbox developer console and

provide the access token in your secret.

version The version of this template that is currently

supported.

Dropbox JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

Data source template schemas 342

Amazon Kendra Developer Guide

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"file": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"LONG",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

Data source template schemas 343

Amazon Kendra Developer Guide

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"paper": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"LONG",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

Data source template schemas 344

Amazon Kendra Developer Guide

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"papert": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"LONG",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

Data source template schemas 345

Amazon Kendra Developer Guide

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"shortcut": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"LONG",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

Data source template schemas 346

Amazon Kendra Developer Guide

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"syncMode": {

"type": "string",

"enum": [

"FULL_CRAWL",

"FORCED_FULL_CRAWL",

"CHANGE_LOG"

]

},

"enableIdentityCrawler": {

"type": "boolean"

},

"secretArn": {

"type": "string"

},

"additionalProperties": {

"type": "object",

"properties": {

"isCrawlAcl": {

"type": "boolean"

},

"inclusionFileNamePatterns": {

"type": "array"

},

"exclusionFileNamePatterns": {

"type": "array"

Data source template schemas 347

Amazon Kendra Developer Guide

},

"inclusionFileTypePatterns": {

"type": "array"

},

"exclusionFileTypePatterns": {

"type": "array"

},

"crawlFile": {

"type": "boolean"

},

"crawlPaper": {

"type": "boolean"

},

"crawlPapert": {

"type": "boolean"

},

"crawlShortcut": {

"type": "boolean"

}

},

"type": {

"type": "string",

"pattern": "DROPBOX"

},

"tokenType": {

"type": "string",

"enum": [

"PERMANENT",

"TEMPORARY"

]

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

}

},

"additionalProperties": false,

"required": [

"connectionConfiguration",

Data source template schemas 348

Amazon Kendra Developer Guide

"repositoryConfigurations",

"additionalProperties",

"syncMode",

"enableIdentityCrawler",

"secretArn",

"type",

"tokenType"

]

}

Drupal template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the Drupal host URL and the authentication type as part of the connection

conﬁguration or repository endpoint details. Also specify the type of data source as DRUPAL, a

secret for your authentication credentials, and other necessary conﬁgurations. You then specify

TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Drupal JSON schema.

The following table describes the parameters of the Drupal JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

hostUrl The host url of your Drupal website. For

example, https://<hostname>/

<drupalsitename> .

repositoryConﬁgurations Conﬁguration information for the content of

the data source.

• content

• comment

• attachment

A list of objects that map the attributes or

ﬁeld names of your Drupal ﬁles. For more

information, see Mapping data source ﬁelds.

Data source template schemas 349

Amazon Kendra Developer Guide

Conﬁguration Description

The Drupal data source ﬁeld names must exist

in your Drupal custom metadata.

additionalProperties Additional conﬁguration options for your

content in your data source.

• inclusionFileNamePatterns

• articleTitleInclusionPatterns

• pageTitleInclusionPatterns

• customContentTitleInclusionPatterns

• basicBlockTitleInclusionPatterns

• customBlockTitleInclusionPatterns

A list of regular expression patterns to include

certain ﬁles in your Drupal data source. Files

that match the patterns are included in the

index. Files that don't match the patterns

are excluded from the index. If a ﬁle matches

both an inclusion and exclusion pattern, the

exclusion pattern takes precedence and the

ﬁle isn't included in the index.

• exclusionFileNamePatterns

• articleTitleExclusionPatterns

• pageTitleExclusionPatterns

• customContentTitleExclusionPatterns

• basicBlockTitleExclusionPatterns

• customBlockTitleExclusionPatterns

A list of regular expression patterns to exclude

certain ﬁles in your Drupal data source. Files

that match the patterns are excluded from the

index. Files that don't match the patterns are

included in the index. If a ﬁle matches both an

exclusion and inclusion pattern, the exclusion

pattern takes precedence and the ﬁle isn't

included in the index.

contentDeﬁnitions

• contentType

• ﬁeldDeﬁnition

• isCrawlComments

• isCrawlFiles

• isCrawlArticle

• isCrawlBasicPage

• isCrawlBasicBlock

• isCrawlCustomContentTypesList

Specify the content types to crawl and

whether to crawl comments and attachments

for your selected content types.

Data source template schemas 350

Amazon Kendra Developer Guide

Conﬁguration Description

type

The type of data source. Specify DRUPAL as

your data source type.

authType The type of authentication that you use,

whether BASIC-AUTH or OAUTH2.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 351

Amazon Kendra Developer Guide

Conﬁguration Description

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

secretARN The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the key-

value pairs required to connect to your Drupal.

The secret must contain a JSON structure with

the following keys:

If using basic authentication:

{

"username": "user name",

"passwords": "password"

}

If using OAuth 2.0 authentication:

{

"username": "user name",

"password": "password" ,

"clientId": "client id",

"clientSecret": "client secret"

}

version The version of this template that is currently

supported.

Data source template schemas 352

Amazon Kendra Developer Guide

Drupal JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"hostUrl": {

"type": "string",

"pattern": "https:.*"

}

},

"required": [

"hostUrl"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"content": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

Data source template schemas 353

Amazon Kendra Developer Guide

"STRING",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"comment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE"

]

},

Data source template schemas 354

Amazon Kendra Developer Guide

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"attachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

Data source template schemas 355

Amazon Kendra Developer Guide

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

"properties": {

"isCrawlArticle": {

"type": "boolean"

},

"isCrawlBasicPage": {

"type": "boolean"

},

"isCrawlBasicBlock": {

"type": "boolean"

},

"crawlCustomContentTypesList": {

"type": "array",

"items": {

"type": "string"

}

},

"crawlCustomBlockTypesList": {

"type": "array",

"items": {

"type": "string"

}

},

"filePath": {

Data source template schemas 356

Amazon Kendra Developer Guide

"anyOf": [

{

"type": "string",

"pattern": "s3:.*"

},

{

"type": "string",

"pattern": ""

}

]

},

"inclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"articleTitleInclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"articleTitleExclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"pageTitleInclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"pageTitleExclusionPatterns": {

"type": "array",

"items": {

Data source template schemas 357

Amazon Kendra Developer Guide

"type": "string"

}

},

"customContentTitleInclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"customContentTitleExclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"basicBlockTitleInclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"basicBlockTitleExclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"customBlockTitleInclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"customBlockTitleExclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"contentDefinitions": {

"type": "array",

"items": {

"properties": {

"contentType": {

Data source template schemas 358

Amazon Kendra Developer Guide

"type": "string"

},

"fieldDefinition": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"machineName": {

"type": "string"

},

"type": {

"type": "string"

}

},

"required": [

"machineName",

"type"

]

}

]

},

"isCrawlComments": {

"type": "boolean"

},

"isCrawlFiles": {

"type": "boolean"

}

},

"required": [

"contentType",

"fieldDefinition",

"isCrawlComments",

"isCrawlFiles"

]

}

},

"required": []

},

"type": {

"type": "string",

"pattern": "DRUPAL"

},

Data source template schemas 359

Amazon Kendra Developer Guide

"authType": {

"type": "string",

"enum": [

"BASIC-AUTH",

"OAUTH2"

]

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"enableIdentityCrawler": {

"type": "boolean"

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Data source template schemas 360

Amazon Kendra Developer Guide

GitHub template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the GitHub host URL, the organization name, and whether you use GitHub

cloud or GitHub on-premises as part of the connection conﬁguration or repository endpoint

details. Also specify the type of data source as GITHUB, a secret for your authentication credentials,

and other necessary conﬁgurations. You then specify TEMPLATE as the Type when you call

CreateDataSource.

You can use the template provided in this developer guide. See GitHub JSON schema.

The following table describes the parameters of the GitHub JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

type

Specify the type as either SAAS or ON_PREMIS

E .

hostUrl The GitHub host URL. For example, if you

use GitHub SaaS/Enterprise Cloud: https://a

pi.github.com. Or, if you use GitHub on-premis

es/Enterprise Server: https://on-prem-host-

url/api/v3/.

organizationName You can ﬁnd your organization name when

you log in to GitHub desktop and go to Your

organizations under your proﬁle picture

dropdown.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• ghRepository A list of objects that map the attributes or

ﬁeld names of your GitHub content to Amazon

Data source template schemas 361

Amazon Kendra Developer Guide

Conﬁguration Description

• ghCommit

• ghIssueDocument

• ghIssueComment

• ghIssueAttachment

• ghPRDocument

• ghPRComment

• ghPRAttachment

Kendra index ﬁeld names. For more informati

on, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source.

isCrawlAcl

true to crawl the access control list (ACL)

information for your documents, if you have

an ACL and want to use it for access control.

The ACL speciﬁes which documents that users

and groups can access and search. The ACL

information is used to ﬁlter search results

based on the user or their group access to

documents. For more information, see User

context ﬁltering.

ﬁeldForUserId Specify the type of user ID that you want to

use for ACL crawling. Specify either email if

you want to use the user email for the user ID,

or username if you want to use the user name

for the user ID. If you don't specify an option

then email is used by default.

repositoryFilter A list of names of the speciﬁc repositories and

branch names you want to index.

crawlRepository

true to crawl repositories.

crawlRepositoryDocuments

true to crawl repository documents.

Data source template schemas 362

Amazon Kendra Developer Guide

Conﬁguration Description

crawlIssue

true to crawl issues.

crawlIssueComment

true to crawl issue comments.

crawlIssueCommentAttachment

true to crawl issue comment attachments.

crawlPullRequest

true to crawl pull requests.

crawlPullRequestComment

true to crawl pull request comments.

crawlPullRequestCommentAttachment

true to crawl pull request comment

attachments.

• inclusionFolderNamePatterns

• inclusionFileTypePatterns

• inclusionFileNamePatterns

A list of regular expression patterns to

include certain content in your GitHub data

source. Content that matches the patterns

are included in the index. Content that

doesn't match the patterns are excluded from

the index. If any content matches both an

inclusion and exclusion pattern, the exclusion

pattern takes precedence, and the content

isn't included in the index.

• exclusionFolderNamePatterns

• exclusionFileTypePatterns

• exclusionFileNamePatterns

A list of regular expression patterns to exclude

certain content in your GitHub data source.

Content that matches the patterns are

excluded from the index. Content that doesn't

match the patterns are included in the index.

If any content matches both an inclusion and

exclusion pattern, the exclusion pattern takes

precedence, and the content isn't included in

the index.

type

The type of data source. Specify GITHUB as

your data source type.

Data source template schemas 363

Amazon Kendra Developer Guide

Conﬁguration Description

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 364

Amazon Kendra Developer Guide

Conﬁguration Description

secretArn The Amazon Resource Name (ARN) of an

AWS Secrets Manager secret that contains

the key-value pairs required to connect to

your GitHub. The secret must contain a JSON

structure with the following keys:

{

"personalToken": " token"

}

version The version of this template that's currently

supported.

GitHub JSON schema

The following is the GitHub JSON schema:

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"type": {

"type": "string"

},

"hostUrl": {

"type": "string",

"pattern": "https://.*"

},

"organizationName": {

"type": "string"

}

},

Data source template schemas 365

Amazon Kendra Developer Guide

"required": [

"type",

"hostUrl",

"organizationName"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"ghRepository": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

Data source template schemas 366

Amazon Kendra Developer Guide

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"ghCommit": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

Data source template schemas 367

Amazon Kendra Developer Guide

}

]

}

},

"required": [

"fieldMappings"

]

},

"ghIssueDocument": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

Data source template schemas 368

Amazon Kendra Developer Guide

},

"required": [

"fieldMappings"

]

},

"ghIssueComment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

Data source template schemas 369

Amazon Kendra Developer Guide

]

},

"ghIssueAttachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"ghPRDocument": {

Data source template schemas 370

Amazon Kendra Developer Guide

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"ghPRComment": {

"type": "object",

"properties": {

"fieldMappings": {

Data source template schemas 371

Amazon Kendra Developer Guide

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"ghPRAttachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

Data source template schemas 372

Amazon Kendra Developer Guide

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

"properties": {

"isCrawlAcl": {

"type": "boolean"

},

"fieldForUserId": {

"type": "string"

Data source template schemas 373

Amazon Kendra Developer Guide

},

"crawlRepository": {

"type": "boolean"

},

"crawlRepositoryDocuments": {

"type": "boolean"

},

"crawlIssue": {

"type": "boolean"

},

"crawlIssueComment": {

"type": "boolean"

},

"crawlIssueCommentAttachment": {

"type": "boolean"

},

"crawlPullRequest": {

"type": "boolean"

},

"crawlPullRequestComment": {

"type": "boolean"

},

"crawlPullRequestCommentAttachment": {

"type": "boolean"

},

"repositoryFilter": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"repositoryName": {

"type": "string"

},

"branchNameList": {

"type": "array",

"items": {

"type": "string"

}

]

},

Data source template schemas 374

Amazon Kendra Developer Guide

"inclusionFolderNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFolderNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"required": []

},

"type": {

"type": "string",

"pattern": "GITHUB"

},

"syncMode": {

Data source template schemas 375

Amazon Kendra Developer Guide

"type": "string",

"enum": [

"FULL_CRAWL",

"FORCED_FULL_CRAWL",

"CHANGE_LOG"

]

},

"enableIdentityCrawler": {

"type": "boolean"

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"enableIdentityCrawler"

]

}

Gmail template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as GMAIL, a secret for your authentication credentials,

and other necessary conﬁgurations. You then specify TEMPLATE as the Type when you call

CreateDataSource.

You can use the template provided in this developer guide. See Gmail JSON schema.

The following table describes the parameters of the Gmail JSON schema.

Data source template schemas 376

Amazon Kendra Developer Guide

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

This data source does not specify an endpoint

in repositoryEndpointMetadata .

Rather, the connection information is included

in an AWS Secrets Manager secret that you

provide the secretArn .

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Specify the type of data source and the secret

ARN.

• message

• attachments

A list of objects that map the attributes or

ﬁeld names of your Gmail messages and

attachments to Amazon Kendra index ﬁeld

names. For more information, see Mapping

data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source.

• inclusionLabelNamePatterns

• exclusionLabelNamePatterns

• inclusionAttachmentTypePatterns

• exclusionAttachmentTypePatterns

• inclusionAttachmentNamePatterns

• exclusionAttachmentNamePatterns

• inclusionSubjectFilter

• exclusionSubjectFilter

• isSubjectAnd

A list of regular expression patterns to include

or exclude messages with speciﬁc subject

names in your Gmail data source. Files that

match the patterns are included in the index.

If a ﬁle matches both an inclusion and an

exclusion pattern, the exclusion pattern takes

precedence, and the ﬁle isn't included in the

index.

Data source template schemas 377

Amazon Kendra Developer Guide

Conﬁguration Description

• inclusionFromFilter

• exclusionFromFilter

• inclusionToFilter

• exclusionToFilter

• inclusionCcFilter

• exclusionCcFilter

• inclusionBccFilter

• exclusionBccFilter

beforeDateFilter Specify messages and attachments to be

included before a certain date.

afterDateFilter Specify messages and attachments to be

included after a certain date.

isCrawlAttachment A Boolean value to choose whether you want

to crawl attachments. Messages are automatic

ally crawled.

type

The type of data source. Specify GMAIL as

your data source type.

shouldCrawlDraftMessages A Boolean value to choose whether you want

to crawl draft messages.

Data source template schemas 378

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Important

Because there is no API to update

permanently deleted Gmail messages,

any new, modiﬁed, or deleted content

sync:

• Won't remove messages that were

permanently deleted from Gmail

from your Amazon Kendra index

• Won't sync changes in Gmail email

labels

To sync your Gmail data source label

changes and permanently deleted

email messages to your Amazon

Kendra index, you must run full crawls

periodically.

Data source template schemas 379

Amazon Kendra Developer Guide

Conﬁguration Description

secretARN The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains the key-

value pairs required to connect to your Gmail.

The secret must contain a JSON structure with

the following keys:

{

"adminAccountEmailId": " service

account email",

"clientEmailId": " user account

email",

"privateKey": " private key"

}

version The version of the template that is currently

supported.

Gmail JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

}

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"message": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

Data source template schemas 380

Amazon Kendra Developer Guide

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"attachments": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING"]

},

"dataSourceFieldName": {

"type": "string"

Data source template schemas 381

Amazon Kendra Developer Guide

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": []

},

"additionalProperties": {

"type": "object",

"properties": {

"inclusionLabelNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionLabelNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionAttachmentTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionAttachmentTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionAttachmentNamePatterns": {

"type": "array",

Data source template schemas 382

Amazon Kendra Developer Guide

"items": {

"type": "string"

}

},

"exclusionAttachmentNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionSubjectFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionSubjectFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"isSubjectAnd": {

"type": "boolean"

},

"inclusionFromFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFromFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionToFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionToFilter": {

Data source template schemas 383

Amazon Kendra Developer Guide

"type": "array",

"items": {

"type": "string"

}

},

"inclusionCcFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionCcFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionBccFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionBccFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"beforeDateFilter": {

"anyOf": [

{

"type": "string",

"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"

},

{

"type": "string",

"pattern": ""

}

]

},

"afterDateFilter": {

"anyOf": [

{

Data source template schemas 384

Amazon Kendra Developer Guide

"type": "string",

"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"

},

{

"type": "string",

"pattern": ""

}

]

},

"isCrawlAttachment": {

"type": "boolean"

},

"shouldCrawlDraftMessages": {

"type": "boolean"

}

},

"required": [

"isCrawlAttachment",

"shouldCrawlDraftMessages"

]

},

"type" : {

"type" : "string",

"pattern": "GMAIL"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL"

]

},

"secretArn": {

"type": "string"

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

}

},

Data source template schemas 385

Amazon Kendra Developer Guide

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"additionalProperties",

"syncMode",

"secretArn",

"type"

]

}

Google Drive template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as GOOGLEDRIVE2, a secret for your authentication

credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the Type when you

call CreateDataSource.

You can use the template provided in this developer guide. See Google Drive JSON schema.

The following table describes the parameters of the Google Drive JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the data source.

repositoryEndpointMetadata The endpoint information for the data

source. This data source does not specify an

endpoint. You choose your authentication

type: serviceAccount and OAuth2. The

connection information is included in an AWS

Secrets Manager secret that you provide the

secretArn .

authType

Choose between serviceAccount and

OAuth2 based on your use case.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Data source template schemas 386

Amazon Kendra Developer Guide

Conﬁguration Description

• ﬁle

• comment

A list of objects that map the attributes or

ﬁeld names of your Google Drive to Amazon

Kendra index ﬁeld names. For more informati

on, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source

• maxFileSizeInMegaBytes Specify a ﬁle size limit in MBs that Amazon

Kendra should crawl.

• iscrawlComment

true to crawl comments in your Google Drive

data source.

• isCrawlMyDriveAndSharedWithMe

true to crawl MyDrive and Shared With Me

Drives in your Google Drive data source.

• isCrawlSharedDrives

true to crawl Shared Drives in your Google

Drive data source.

isCrawlAcl

true to crawl the access control list (ACL)

information for your documents, if you have

an ACL and want to use it for access control.

The ACL speciﬁes which documents that users

and groups can access and search. The ACL

information is used to ﬁlter search results

based on the user or their group access to

documents. For more information, see User

context ﬁltering.

Data source template schemas 387

Amazon Kendra Developer Guide

Conﬁguration Description

• excludeUserAccounts

• excludeSharedDrives

• excludeMimeTypes

• exclusionFileTypePatterns

• exclusionFileNamePatterns

• exclusionFilePathFilter

A list of regular expression patterns to exclude

certain ﬁles in your Google Drive data source.

Files that match the patterns are excluded

from the index. Files that don't match the

patterns are included in the index. If a ﬁle

matches both an exclusion and inclusion

pattern, the exclusion pattern takes precedenc

e, and the ﬁle isn't included in the index.

• includeUserAccounts

• includeSharedDrives

• includeMimeTypes

• inclusionFileTypePatterns

• inclusionFileNamePatterns

• inclusionFilePathFilter

A list of regular expression patterns to include

certain ﬁles in your Google Drive data source.

Files that match the patterns are included in

the index. Files that don't match the patterns

are excluded from the index. If a ﬁle matches

both an inclusion and exclusion pattern, the

exclusion pattern takes precedence, and the

ﬁle isn't included in the index.

type

The type of data source. Specify GOOOGLEDR

IVEV2 as your data source type.

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

Data source template schemas 388

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 389

Amazon Kendra Developer Guide

Conﬁguration Description

secretARN The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the

key-value pairs required to connect to your

Google Drive. The secret must contain a JSON

structure with the following keys:

If using Google Service Account authentic

ation:

{

"clientEmail": " user account

email",

"adminAccountEmail": " service

account email",

"privateKey": " private key"

}

If using OAuth 2.0 authentication:

{

"clientID": " OAuth client ID",

"clientSecret": " client secret",

"refreshToken": " refresh token"

}

version The version of this template that is currently

supported.

Google Drive JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

Data source template schemas 390

Amazon Kendra Developer Guide

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"authType": {

"type": "string",

"enum": [

"serviceAccount",

"OAuth2"

]

}

},

"required": [

"authType"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"file": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"STRING_LIST",

"LONG"

]

},

Data source template schemas 391

Amazon Kendra Developer Guide

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"comment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"STRING_LIST"

]

},

"dataSourceFieldName": {

"type": "string"

},

Data source template schemas 392

Amazon Kendra Developer Guide

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

"properties": {

"maxFileSizeInMegaBytes": {

"type": "string"

},

"isCrawlComment": {

"type": "boolean"

},

"isCrawlMyDriveAndSharedWithMe": {

"type": "boolean"

},

"isCrawlSharedDrives": {

"type": "boolean"

},

"isCrawlAcl": {

"type": "boolean"

},

"excludeUserAccounts": {

"type": "array",

"items": {

"type": "string"

}

},

Data source template schemas 393

Amazon Kendra Developer Guide

"excludeSharedDrives": {

"type": "array",

"items": {

"type": "string"

}

},

"excludeMimeTypes": {

"type": "array",

"items": {

"type": "string"

}

},

"includeUserAccounts": {

"type": "array",

"items": {

"type": "string"

}

},

"includeSharedDrives": {

"type": "array",

"items": {

"type": "string"

}

},

"includeMimeTypes": {

"type": "array",

"items": {

"type": "string"

}

},

"includeTargetAudienceGroup": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFileNamePatterns": {

"type": "array",

Data source template schemas 394

Amazon Kendra Developer Guide

"items": {

"type": "string"

}

},

"exclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFilePathFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFilePathFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"type": {

"type": "string",

"pattern": "GOOGLEDRIVEV2"

},

"enableIdentityCrawler": {

"type": "boolean"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

Data source template schemas 395

Amazon Kendra Developer Guide

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

IBM DB2 template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as db2, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

You can use the template provided in this developer guide. See IBM DB2 JSON schema.

The following table describes the parameters of the IBM DB2 JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

Data source template schemas 396

Amazon Kendra Developer Guide

Conﬁguration Description

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

titleColumn Provide the name of the document title

column within your database table.

bodyColumn Provide the name of the document title

column within your database table.

Data source template schemas 397

Amazon Kendra Developer Guide

Conﬁguration Description

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

Data source template schemas 398

Amazon Kendra Developer Guide

Conﬁguration Description

type

The type of data source. Specify JDBC as your

data source type.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

Data source template schemas 399

Amazon Kendra Developer Guide

Conﬁguration Description

version The version of the template that is currently

supported.

IBM DB2 JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

Data source template schemas 400

Amazon Kendra Developer Guide

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

Data source template schemas 401

Amazon Kendra Developer Guide

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

Data source template schemas 402

Amazon Kendra Developer Guide

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Data source template schemas 403

Amazon Kendra Developer Guide

Microsoft Exchange template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the tenant ID as as a part of the connection conﬁguration or repository

endpoint details. Also specify the type of data source as MSEXCHANGE, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft Exchange JSON schema.

The following table describes the parameters of the Microsoft Exchange JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

tenantId The Microsoft 365 tenant ID. You can ﬁnd

your tenant ID in the Properties of your Azure

Active Directory Portal or in your OAuth

application.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• email

• attachment

• calendar

• contacts

• notes

A list of objects that map the attributes or

ﬁeld names of your Microsoft Exchange data

source to Amazon Kendra index ﬁelds. For

more information, see Mapping data source

ﬁelds.

additionalProperties Additional conﬁguration options for content in

your data source

inclusionPatterns A list of regular expression patterns to include

certain ﬁles in your Microsoft Exchange data

Data source template schemas 404

Amazon Kendra Developer Guide

Conﬁguration Description

source. Files that match the patterns are

included in the index. Files that don't match

the patterns are excluded from the index. If a

ﬁle matches both an inclusion and exclusion

pattern, the exclusion pattern takes precedenc

e and the ﬁle isn't included in the index.

exclusionPatterns A list of regular expression patterns to exclude

certain ﬁles in your Microsoft Exchange data

source. Files that match the patterns are

excluded from the index. Files that don't

match the patterns are included in the index.

If a ﬁle matches both an exclusion and

inclusion pattern, the exclusion pattern takes

precedence and the ﬁle isn't included in the

index.

• inclusionUsersList

• inclusionUsersFileName

• inclusionDomainUsers

A list of regular expression patterns to include

certain users and user ﬁles in your Microsofo

t Exchange data source. Users that match the

patterns are included in the index. Users that

don't match the patterns are excluded from

the index. If a user matches both an inclusion

and exclusion pattern, the exclusion pattern

takes precedence and the user isn't included in

the index.

Data source template schemas 405

Amazon Kendra Developer Guide

Conﬁguration Description

• exclusionUsersList

• exclusionUsersFileName

• exclusionDomainUsers

A list of regular expression patterns to exclude

certain users and user ﬁles in your Microsoft

Exchange data source. Users that match the

patterns are excluded from the index. Users

that don't match the patterns are included in

the index. If a user matches both an exclusion

and inclusion pattern, the exclusion pattern

takes precedence and the user isn't included in

the index.

s3bucketName The name of your S3 bucket if that you want

to use.

• crawlCalendar

• crawlNotes

• crawlContacts

• crawlFolderAcl

true to crawl these types of content and

access control information your Microsoft

Exchange data source.

startCalendarDateTime You can conﬁgure a speciﬁc start date-time for

your calendar content.

endCalendarDateTime You can conﬁgure a speciﬁc end date-time for

calendar content.

subject You can conﬁgure a speciﬁc subject line for

your mail content.

emailFrom You can conﬁgure a speciﬁc email for your

'From' or sender mail content.

emailTo You can conﬁgure a speciﬁc email for your 'To'

or recipient mail content.

Data source template schemas 406

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

type

The type of data source. Specify MSEXCHANG

E as your data source type.

secretARN The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the

key-value pairs required to connect to your

Microsoft Exchange. This includes your client

ID and your client secret that is generated

when you create an OAuth application in the

Azure portal.

version The version of this template that is currently

supported.

Data source template schemas 407

Amazon Kendra Developer Guide

Microsoft Exchange JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"tenantId": {

"type": "string",

"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]

{12}$",

"minLength": 36,

"maxLength": 36

}

},

"required": ["tenantId"]

}

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"email": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "DATE"]

},

"dataSourceFieldName": {

Data source template schemas 408

Amazon Kendra Developer Guide

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"attachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "DATE","LONG"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

Data source template schemas 409

Amazon Kendra Developer Guide

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"calendar": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

Data source template schemas 410

Amazon Kendra Developer Guide

}

},

"required": [

"fieldMappings"

]

},

"contacts": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"notes": {

Data source template schemas 411

Amazon Kendra Developer Guide

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": ["email"

]

},

"additionalProperties": {

"type": "object",

"properties": {

"inclusionPatterns": {

Data source template schemas 412

Amazon Kendra Developer Guide

"type": "array",

"items": {

"type": "string"

}

},

"exclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionUsersList": {

"type": "array",

"items": {

"type": "string",

"format": "email"

}

},

"exclusionUsersList": {

"type": "array",

"items": {

"type": "string",

"format": "email"

}

},

"s3bucketName": {

"type": "string"

},

"inclusionUsersFileName": {

"type": "string"

},

"exclusionUsersFileName": {

"type": "string"

},

"inclusionDomainUsers": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionDomainUsers": {

"type": "array",

"items": {

"type": "string"

Data source template schemas 413

Amazon Kendra Developer Guide

}

},

"crawlCalendar": {

"type": "boolean"

},

"crawlNotes": {

"type": "boolean"

},

"crawlContacts": {

"type": "boolean"

},

"crawlFolderAcl": {

"type": "boolean"

},

"startCalendarDateTime": {

"anyOf": [

{

"type": "string",

"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"

},

{

"type": "string",

"pattern": ""

}

]

},

"endCalendarDateTime": {

"anyOf": [

{

"type": "string",

"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"

},

{

"type": "string",

"pattern": ""

}

]

},

"subject": {

"type": "array",

"items": {

"type": "string"

}

},

Data source template schemas 414

Amazon Kendra Developer Guide

"emailFrom": {

"type": "array",

"items": {

"type": "string",

"format": "email"

}

},

"emailTo": {

"type": "array",

"items": {

"type": "string",

"format": "email"

}

},

"required": [

]

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"type" : {

"type" : "string",

"pattern": "MSEXCHANGE"

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

Data source template schemas 415

Amazon Kendra Developer Guide

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Microsoft OneDrive template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the tenant ID as part of the connection conﬁguration or repository endpoint

details. Also specify the type of data source as ONEDRIVEV2, and a secret for your authentication

credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the Type when you

call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft OneDrive JSON schema.

The following table describes the parameters of the Microsoft OneDrive JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

tenantId The Microsoft 365 tenant ID. You can ﬁnd

your tenant ID in the Properties of your Azure

Active Directory Portal or in your OAuth

application.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

ﬁle A list of objects that map the attributes or

ﬁeld names of your Microsoft OneDrive ﬁles to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

Data source template schemas 416

Amazon Kendra Developer Guide

Conﬁguration Description

additionalProperties Additional conﬁguration options for your

content in your data source

• userNameFilter

• userFilterPath

• inclusionFileTypePatterns

• exclusionFileTypePatterns

• inclusionFileNamePatterns

• exclusionFileNamePatterns

• inclusionFilePathPatterns

• exclusionFilePathPatterns

• inclusionOneNoteSectionNamePatterns

• exclusionOneNoteSectionNamePatterns

• inclusionOneNotePageNamePatterns

• exclusionOneNotepageNamePatterns

You can choose to index speciﬁc ﬁles,

OneNote sections, OneNote pages, and ﬁlter

by user name.

isUserNameOnS3

true to provide a list of user names in a ﬁle

stored in an Amazon S3.

type

The type of data source. Specify ONEDRIVEV

2 as your data source type.

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

Data source template schemas 417

Amazon Kendra Developer Guide

Conﬁguration Description

type

The type of data source. Specify ONEDRIVEV

2 as your data source type.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

secretARN The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the

key-value pairs required to connect to your

Microsoft OneDrive. The secret must contain a

JSON structure with the following keys:

{

"clientId": " client ID",

"clientSecret": " client secret"

}

Data source template schemas 418

Amazon Kendra Developer Guide

Conﬁguration Description

version The version of this template that is currently

supported.

Microsoft OneDrive JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"tenantId": {

"type": "string",

"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",

"minLength": 36,

"maxLength": 36

}

},

"required": [

"tenantId"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"file": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

Data source template schemas 419

Amazon Kendra Developer Guide

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

"properties": {

"userNameFilter": {

"type": "array",

"items": {

Data source template schemas 420

Amazon Kendra Developer Guide

"type": "string"

}

},

"userFilterPath": {

"type": "string"

},

"isUserNameOnS3": {

"type": "boolean"

},

"inclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFilePathPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFilePathPatterns": {

"type": "array",

"items": {

"type": "string"

}

Data source template schemas 421

Amazon Kendra Developer Guide

},

"inclusionOneNoteSectionNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionOneNoteSectionNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionOneNotePageNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionOneNotePageNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"required": []

},

"enableIdentityCrawler": {

"type": "boolean"

},

"type": {

"type": "string",

"pattern": "ONEDRIVEV2"

},

"syncMode": {

"type": "string",

"enum": [

"FULL_CRAWL",

"FORCED_FULL_CRAWL",

"CHANGE_LOG"

]

},

Data source template schemas 422

Amazon Kendra Developer Guide

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Microsoft SharePoint template schema

You include a JSON that contains the data source schema as part of TemplateConﬁguration object.

You provide the SharePoint site URL/URLs, domain, and also a tenant ID if required as a part of

the connection conﬁguration or repository endpoint details. Also specify the type of data source as

SHAREPOINTV2, a secret for your authentication credentials, and other necessary conﬁgurations.

You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See SharePoint JSON schema.

The following table describes the parameters of the Microsoft SharePoint JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source

repositoryEndpointMetadata The endpoint information for the data source

Data source template schemas 423

Amazon Kendra Developer Guide

Conﬁguration Description

tenantId The tenant id of your SharePoint account.

domain The domain of your SharePoint account.

siteUrls The host URLs of your SharePoint account.

repositoryAdditionalProperties Additional properties to connect with the

repository/data source endpoint.

s3bucketName The name of the Amazon S3 bucket that

stores your Azure AD self-signed X.509

certiﬁcate.

s3certiﬁcateName The name of the Azure AD self-signed X.509

certiﬁcate stored in your Amazon S3 bucket.

authType The type of authentication that you use,

whether OAuth2, OAuth2Certificate ,

OAuth2App , Basic, OAuth2_Re

freshToken , NTLM, or Kerberos.

version The SharePoint version that you use, whether

Server or Online.

onPremVersion The SharePoint Server version that you use,

whether 2013, 2016 2019, or Subscript

ionEdition .

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Data source template schemas 424

Amazon Kendra Developer Guide

Conﬁguration Description

• event

• page

• ﬁle

• link

• attachment

• comment

A list of objects that map the attributes or

ﬁeld names of your SharePoint content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source.

• eventTitleFilterRegEx

• pageTitleFilterRegEx

• linkTitleFilterRegEx

• inclusionFilePath

• exclusionFilePath

• inclusionFileTypePatterns

• exclusionFileTypePatterns

• inclusionFileNamePatterns

• exclusionFileNamePatterns

• inclusionOneNoteSectionNamePatterns

• exclusionOneNoteSectionNamePatterns

• inclusionOneNotePageNamePatterns

• exclusionOneNotePageNamePatterns

A list of regular expression patterns to

include/exclude certain content in your

SharePoint data source. Content itmes that

match the inclusion patterns are included in

the index. Content items that don't match

the inclusion patterns are excluded from the

index. If a ﬁle matches both an inclusion and

exclusion pattern, the exclusion pattern takes

precedence, and the ﬁle isn't included in the

index.

• crawlFiles

• crawlPages

• crawlEvents

• crawlComments

• crawlLinks

• crawlAttachments

true to crawl these types of content.

Data source template schemas 425

Amazon Kendra Developer Guide

Conﬁguration Description

crawlAcl

true to crawl the access control list (ACL)

information for your documents, if you have

an ACL and want to use it for access control.

The ACL speciﬁes which documents that users

and groups can access and search. The ACL

information is used to ﬁlter search results

based on the user or their group access to

documents. For more information, see User

context ﬁltering.

ﬁeldForUserId

Specify either email if you want to use the

user email for the user ID, or userPrinc

ipalName if you want to use a user name for

the user ID. If you don't specify an option then

email is used by default.

aclConﬁguration

Specify either ACLWithLDAPEmailFmt ,

ACLWithManualEmailFmt , or ACLWithUs

ernameFmtM .

emailDomain The domain of the email. For example,

"amazon.com ".

• isCrawlLocalGroupMapping

• isCrawlAdGroupMapping

true to crawl group mapping information.

proxyHost The host name of the web proxy that you use,

without the http:// or https:// protocol.

proxyPort The port number used by the host URL

transport protocol. Must be a numeric value

between 0 and 65535.

type

Specify SHAREPOINTV2 as your data source

type

Data source template schemas 426

Amazon Kendra Developer Guide

Conﬁguration Description

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 427

Amazon Kendra Developer Guide

Conﬁguration Description

secretARN The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the

key-value pairs required to connect to your

SharePoint. For information on these key-

value pairs, see Connection instructions for

SharePoint Online and SharePoint Server.

version The version of this template that is currently

supported.

SharePoint JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"tenantId": {

"type": "string",

"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",

"minLength": 36,

"maxLength": 36

},

"domain": {

"type": "string"

},

"siteUrls": {

"type": "array",

"items": {

"type": "string",

"pattern": "https://.*"

}

},

"repositoryAdditionalProperties": {

Data source template schemas 428

Amazon Kendra Developer Guide

"type": "object",

"properties": {

"s3bucketName": {

"type": "string"

},

"s3certificateName": {

"type": "string"

},

"authType": {

"type": "string",

"enum": [

"OAuth2",

"OAuth2Certificate",

"OAuth2App",

"Basic",

"OAuth2_RefreshToken",

"NTLM",

"Kerberos"

]

},

"version": {

"type": "string",

"enum": [

"Server",

"Online"

]

},

"onPremVersion": {

"type": "string",

"enum": [

"",

"2013",

"2016",

"2019",

"SubscriptionEdition"

]

}

},

"required": [

"authType",

"version"

]

}

},

Data source template schemas 429

Amazon Kendra Developer Guide

"required": [

"siteUrls",

"domain",

"repositoryAdditionalProperties"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"event": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

Data source template schemas 430

Amazon Kendra Developer Guide

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"page": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

Data source template schemas 431

Amazon Kendra Developer Guide

}

]

}

},

"required": [

"fieldMappings"

]

},

"file": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

Data source template schemas 432

Amazon Kendra Developer Guide

},

"required": [

"fieldMappings"

]

},

"link": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

Data source template schemas 433

Amazon Kendra Developer Guide

]

},

"attachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"comment": {

Data source template schemas 434

Amazon Kendra Developer Guide

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

Data source template schemas 435

Amazon Kendra Developer Guide

"properties": {

"eventTitleFilterRegEx": {

"type": "array",

"items": {

"type": "string"

}

},

"pageTitleFilterRegEx": {

"type": "array",

"items": {

"type": "string"

}

},

"linkTitleFilterRegEx": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFilePath": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFilePath": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFileNamePatterns": {

Data source template schemas 436

Amazon Kendra Developer Guide

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionOneNoteSectionNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionOneNoteSectionNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionOneNotePageNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionOneNotePageNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"crawlFiles": {

"type": "boolean"

},

"crawlPages": {

"type": "boolean"

},

"crawlEvents": {

"type": "boolean"

},

Data source template schemas 437

Amazon Kendra Developer Guide

"crawlComments": {

"type": "boolean"

},

"crawlLinks": {

"type": "boolean"

},

"crawlAttachments": {

"type": "boolean"

},

"crawlListData": {

"type": "boolean"

},

"crawlAcl": {

"type": "boolean"

},

"fieldForUserId": {

"type": "string"

},

"aclConfiguration": {

"type": "string",

"enum": [

"ACLWithLDAPEmailFmt",

"ACLWithManualEmailFmt",

"ACLWithUsernameFmt"

]

},

"emailDomain": {

"type": "string"

},

"isCrawlLocalGroupMapping": {

"type": "boolean"

},

"isCrawlAdGroupMapping": {

"type": "boolean"

},

"proxyHost": {

"type": "string"

},

"proxyPort": {

"type": "string"

}

},

"required": [

]

Data source template schemas 438

Amazon Kendra Developer Guide

},

"type": {

"type": "string",

"pattern": "SHAREPOINTV2"

},

"enableIdentityCrawler": {

"type": "boolean"

},

"syncMode": {

"type": "string",

"enum": [

"FULL_CRAWL",

"FORCED_FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"enableIdentityCrawler",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Data source template schemas 439

Amazon Kendra Developer Guide

Microsoft SQL Server template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as sqlserver, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft SQL Server JSON schema.

The following table describes the parameters of the Micorosft SQL Server JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

Data source template schemas 440

Amazon Kendra Developer Guide

Conﬁguration Description

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

titleColumn Provide the name of the document title

column within your database table.

bodyColumn Provide the name of the document title

column within your database table.

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

Data source template schemas 441

Amazon Kendra Developer Guide

Conﬁguration Description

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

type

The type of data source. Specify JDBC as your

data source type.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 442

Amazon Kendra Developer Guide

Conﬁguration Description

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

version The version of the template that is currently

supported.

Microsoft SQL Server JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

Data source template schemas 443

Amazon Kendra Developer Guide

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

Data source template schemas 444

Amazon Kendra Developer Guide

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

Data source template schemas 445

Amazon Kendra Developer Guide

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

Data source template schemas 446

Amazon Kendra Developer Guide

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Microsoft Teams template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the tenant ID as a part of the connection conﬁguration or repository endpoint

details. Also specify the type of data source as MSTEAMS, a secret for your authentication

credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the Type when you

call CreateDataSource.

You can use the template provided in this developer guide. See Microsoft Teams JSON schema.

The following table describes the parameters of the Microsoft Teams JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for endpoint for

the data source.

repositoryEndpointMetadata The endpoint information for the data source.

tenantId The Microsoft 365 tenant ID. You can ﬁnd

your tenant ID in the Properties of your Azure

Active Directory Portal or in your OAuth

application.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• chatMessage

• chatAttachment

• channelPost

A list of objects that map the attributes or

ﬁeld names of your Microsoft Teams content

to Amazon Kendra index ﬁeld names. For

Data source template schemas 447

Amazon Kendra Developer Guide

Conﬁguration Description

• channelWiki

• channelAttachment

• meetingChat

• meetingFile

• meetingNote

• calendarMeeting

more information, see Mapping data source

ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source.

paymentModel Speciﬁes what type of payment model to use

with your Microsoft Teams data source. Model

A payment models are restricted to licensing

and payment models that require security

compliance. Model B payment models are

suitable for licensing and payment models

that do not require security compliance.

• inclusionTeamNameFilter

• inclusionChannelNameFilter

• inclusionFileNamePatterns

• inclusionFileTypePatterns

• inclusionUserEmailFilter

• inclusionOneNoteSectionNamePatterns

• inclusionOneNotePageNamePatterns

A list of regular expression patterns to include

certain content in your Microsoft Teams data

source. Content that matches the patterns are

included in the index. Content that doesn't

match the patterns are excluded from the

index. If content matches both an inclusion

and exclusion pattern, the exclusion pattern

takes precedence, and the content isn't

included in the index.

Data source template schemas 448

Amazon Kendra Developer Guide

Conﬁguration Description

• exclusionTeamNameFilter

• exclusionChannelNameFilter

• exclusionFileNamePatterns

• exclusionFileTypePatterns

• exclusionUserEmailFilter

• exclusionOneNoteSectionNamePatterns

• exclusionOneNotePageNamePatterns

A list of regular expression patterns to exclude

certain content in your Microsoft Teams data

source. Content that matches the patterns are

excluded from the index. Content that doesn't

match the patterns are included in the index.

If content matches both an inclusion and

exclusion pattern, the exclusion pattern takes

precedence, and the content isn't included in

the index.

• isCrawlChatMessage

• isCrawlChatAttachment

• isCrawlChannelPost

• isCrawlChannelAttachment

• isCrawlChannelWiki

• isCrawlCalendarMeeting

• isCrawlMeetingChat

• isCrawlMeetingFile

• isCrawlMeetingNote

true to crawl these types of content in your

Microsoft Teams data source.

startCalendarDateTime You can conﬁgure a speciﬁc start date-time for

your calendar content.

endCalendarDateTime You can conﬁgure a speciﬁc end date-time for

calendar content.

type

The type of data source. Specify MSTEAMS as

your data source type.

Data source template schemas 449

Amazon Kendra Developer Guide

Conﬁguration Description

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 450

Amazon Kendra Developer Guide

Conﬁguration Description

secretArn The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the

key-value pairs required to connect to your

Microsoft Teams. This includes your client

ID and client secret that is generated when

you create an OAuth application in the Azure

portal.

version The version of this template that is currently

supported.

Microsoft Teams JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"tenantId": {

"type": "string",

"pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]

{12}$",

"minLength": 36,

"maxLength": 36

}

},

"required": [

"tenantId"

]

}

},

"required": [

"repositoryEndpointMetadata"

Data source template schemas 451

Amazon Kendra Developer Guide

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"chatMessage": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

Data source template schemas 452

Amazon Kendra Developer Guide

]

},

"chatAttachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"channelPost": {

Data source template schemas 453

Amazon Kendra Developer Guide

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"channelWiki": {

"type": "object",

"properties": {

"fieldMappings": {

Data source template schemas 454

Amazon Kendra Developer Guide

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"channelAttachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

Data source template schemas 455

Amazon Kendra Developer Guide

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"meetingChat": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

Data source template schemas 456

Amazon Kendra Developer Guide

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"meetingFile": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

Data source template schemas 457

Amazon Kendra Developer Guide

"type": "string",

"enum": [

"STRING",

"DATE",

"LONG"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"meetingNote": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

Data source template schemas 458

Amazon Kendra Developer Guide

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"calendarMeeting": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE"

]

},

"dataSourceFieldName": {

Data source template schemas 459

Amazon Kendra Developer Guide

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

"properties": {

"paymentModel": {

"type": "string",

"enum": [

"A",

"B",

"Evaluation Mode"

]

},

"inclusionTeamNameFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionTeamNameFilter": {

"type": "array",

"items": {

"type": "string"

}

Data source template schemas 460

Amazon Kendra Developer Guide

},

"inclusionChannelNameFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionChannelNameFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionUserEmailFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionOneNoteSectionNamePatterns": {

Data source template schemas 461

Amazon Kendra Developer Guide

"type": "array",

"items": {

"type": "string"

}

},

"exclusionOneNoteSectionNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionOneNotePageNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionOneNotePageNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"isCrawlChatMessage": {

"type": "boolean"

},

"isCrawlChatAttachment": {

"type": "boolean"

},

"isCrawlChannelPost": {

"type": "boolean"

},

"isCrawlChannelAttachment": {

"type": "boolean"

},

"isCrawlChannelWiki": {

"type": "boolean"

},

"isCrawlCalendarMeeting": {

"type": "boolean"

},

"isCrawlMeetingChat": {

"type": "boolean"

},

Data source template schemas 462

Amazon Kendra Developer Guide

"isCrawlMeetingFile": {

"type": "boolean"

},

"isCrawlMeetingNote": {

"type": "boolean"

},

"startCalendarDateTime": {

"anyOf": [

{

"type": "string",

"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"

},

{

"type": "string",

"pattern": ""

}

]

},

"endCalendarDateTime": {

"anyOf": [

{

"type": "string",

"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"

},

{

"type": "string",

"pattern": ""

}

]

}

},

"required": []

},

"type": {

"type": "string",

"pattern": "MSTEAMS"

},

"enableIdentityCrawler": {

"type": "boolean"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

Data source template schemas 463

Amazon Kendra Developer Guide

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Microsoft Yammer template schema

You include a JSON that contains the data source schema as part of TemplateConﬁguration object.

Specify the type of data source as YAMMER, a secret for your authentication credentials, and other

necessary conﬁgurations. You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide.

The following table describes the parameters of the Microsoft Yammer JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the data source.

Data source template schemas 464

Amazon Kendra Developer Guide

Conﬁguration Description

repositoryEndpointMetadata The endpoint information for the data source.

This data source does not specify an endpoint

in repositoryEndpointMetadata .

Rather, the connection information is included

in an AWS Secrets Manager secret that you

provide the secretArn .

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• community

• user

• message

• attachment

A list of objects that map attributes or ﬁeld

names of Microsoft Yammer content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source

inclusionPatterns A list of regular expression patterns to include

certain ﬁles in your Microsoft Yammer data

source. Files that match the patterns are

included in the index. File that don't match

the patterns are excluded from the index. If a

ﬁle matches both an inclusion and exclusion

pattern, the exclusion pattern takes precedenc

e and the ﬁle isn't included in the index.

Data source template schemas 465

Amazon Kendra Developer Guide

Conﬁguration Description

exclusionPatterns A list of regular expression patterns to exclude

certain ﬁles in your Microsoft Yammer data

source. Files that match the patterns are

excluded from the index. Files that don't

match the patterns are included in the index.

If a ﬁle matches both an exclusion and

inclusion pattern, the exclusion pattern takes

precedence and the ﬁle isn't included in the

index.

sinceDate

You can choose to conﬁgure a sinceDate

parameter so that the Microsoft Yammer

connector crawls content based on a speciﬁc

sinceDate .

communityNameFilter You can choose to index speciﬁc community

content.

• isCrawlMessage

• isCrawlAttachment

• isCrawlPrivateMessage

true to crawl messages, message attachmen

ts, and private messages.

type

Specify YAMMER as your data source type.

secretARN The Amazon Resource Name (ARN) of an

AWS Secrets Manager secret that contains

the key-value pairs required to connect

to your Microsoft Yammer. This includes

your Microsoft Yammer user name and

password, and client ID and client secret

that is generated when you create an OAuth

application in the Azure portal.

Data source template schemas 466

Amazon Kendra Developer Guide

Conﬁguration Description

useChangeLog

true to use the Microsoft Yammer change

log to determine which documents require

updating in the index.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

Data source template schemas 467

Amazon Kendra Developer Guide

Microsoft Yammer JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"community": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

Data source template schemas 468

Amazon Kendra Developer Guide

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"user": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

Data source template schemas 469

Amazon Kendra Developer Guide

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"message": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

Data source template schemas 470

Amazon Kendra Developer Guide

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"attachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

Data source template schemas 471

Amazon Kendra Developer Guide

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

"properties": {

"inclusionPatterns": {

"type": "array"

},

"exclusionPatterns": {

"type": "array"

},

"sinceDate": {

"type": "string",

"pattern": "^(19|2[0-9])[0-9]{2}-(0[1-9]|1[012])-(0[1-9]|[12][0-9]|

3[01])T(0[0-9]|1[0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9])((\\+|-)(0[0-9]|1[0-9]|2[0-3]):

([0-5][0-9]))?$"

},

"communityNameFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"isCrawlMessage": {

"type": "boolean"

},

"isCrawlAttachment": {

"type": "boolean"

},

"isCrawlPrivateMessage": {

Data source template schemas 472

Amazon Kendra Developer Guide

"type": "boolean"

}

},

"required": [

"sinceDate"

]

},

"type": {

"type": "string",

"pattern": "YAMMER"

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

},

"useChangeLog": {

"type": "string",

"enum": [

"true",

"false"

]

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"enableIdentityCrawler": {

"type": "boolean"

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

}

},

"required": [

Data source template schemas 473

Amazon Kendra Developer Guide

"connectionConfiguration",

"repositoryConfigurations",

"additionalProperties",

"type",

"secretArn",

"syncMode"

]

}

MySQL template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as mysql, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

You can use the template provided in this developer guide. See MySQL JSON schema.

The following table describes the parameters of the MySQL JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Data source template schemas 474

Amazon Kendra Developer Guide

Conﬁguration Description

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

titleColumn Provide the name of the document title

column within your database table.

bodyColumn Provide the name of the document title

column within your database table.

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

Data source template schemas 475

Amazon Kendra Developer Guide

Conﬁguration Description

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

type

The type of data source. Specify JDBC as your

data source type.

Data source template schemas 476

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

version The version of the template that is currently

supported.

Data source template schemas 477

Amazon Kendra Developer Guide

MySQL JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

Data source template schemas 478

Amazon Kendra Developer Guide

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

Data source template schemas 479

Amazon Kendra Developer Guide

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

Data source template schemas 480

Amazon Kendra Developer Guide

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Oracle Database template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as oracle, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Oracle Database JSON schema.

The following table describes the parameters of the Oracle Database JSON schema.

Data source template schemas 481

Amazon Kendra Developer Guide

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

titleColumn Provide the name of the document title

column within your database table.

Data source template schemas 482

Amazon Kendra Developer Guide

Conﬁguration Description

bodyColumn Provide the name of the document title

column within your database table.

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

Data source template schemas 483

Amazon Kendra Developer Guide

Conﬁguration Description

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

type

The type of data source. Specify JDBC as your

data source type.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

Data source template schemas 484

Amazon Kendra Developer Guide

Conﬁguration Description

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

version The version of the template that is currently

supported.

Oracle Database JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

Data source template schemas 485

Amazon Kendra Developer Guide

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

Data source template schemas 486

Amazon Kendra Developer Guide

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

Data source template schemas 487

Amazon Kendra Developer Guide

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

Data source template schemas 488

Amazon Kendra Developer Guide

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

PostgreSQL template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. Specify the type of data source as JDBC, the database type as postgresql, a secret for

your authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as

the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See PostgreSQL JSON schema.

The following table describes the parameters of the PostgreSQL JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata Required conﬁguration information for

connecting your data source.

• dbType—The type of Java database that

you use, whether mysql, db2, postgresq

l , oracle, or sqlserver .

• dbHost—The database host name.

• dbPort—The database port.

• dbInstance—The database instance.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

Data source template schemas 489

Amazon Kendra Developer Guide

Conﬁguration Description

Specify the type of data source and the secret

ARN.

document A list of objects that map the attributes or

ﬁeld names of your database content to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

additionalProperties Additional conﬁguration options for your

content in your data source. Use to include or

exclude speciﬁc content in your database data

source.

primaryKey Provide the primary key for the database

table. This identiﬁes a table within your

database.

titleColumn Provide the name of the document title

column within your database table.

bodyColumn Provide the name of the document title

column within your database table.

sqlQuery Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

timestampColumn Enter the name of the column which contains

time stamps. Amazon Kendra uses time stamp

information to detect changes in your content

and sync only changed content.

timestampFormat Enter the name of the column which contains

time stamp formats to use to detect content

changes and re-sync your content.

Data source template schemas 490

Amazon Kendra Developer Guide

Conﬁguration Description

timezone Enter the name of the column which contains

time zones for the content to be crawled.

changeDetectingColumns Enter the names of the columns that Amazon

Kendra will use to detect content changes.

Amazon Kendra will re-index content when

there is a change in any of these columns

allowedUsersColumns Enter the name of the column which contains

User IDs to be allowed access to content.

allowedGroupsColumn Enter the name of the column which contains

User IDs to be allowed access to content.

sourceURIColumn Enter the name of the column which contains

Source URLs to be indexed.

isSslEnabled Enter SQL query statements like SELECT and

JOIN operations. SQL queries must be less

than 32KB. Amazon Kendra will crawl all

database content that matches your query.

type

The type of data source. Specify JDBC as your

data source type.

Data source template schemas 491

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

secretArn The Amazon Resource Name (ARN) of a

Secrets Manager secret that contains user

name and password required to connect to

your database. The secret must contain a

JSON structure with the following keys:

{

"user name": "database user name",

"password": " password"

}

version The version of the template that is currently

supported.

Data source template schemas 492

Amazon Kendra Developer Guide

PostgreSQL JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"dbType": {

"type": "string",

"enum": [

"mysql",

"db2",

"postgresql",

"oracle",

"sqlserver"

]

},

"dbHost": {

"type": "string"

},

"dbPort": {

"type": "string"

},

"dbInstance": {

"type": "string"

}

},

"required": [

"dbType",

"dbHost",

"dbPort",

"dbInstance"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

Data source template schemas 493

Amazon Kendra Developer Guide

"repositoryConfigurations": {

"type": "object",

"properties": {

"document": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string"

},

"dataSourceFieldName": {

"type": "string"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"primaryKey": {

"type": "string"

Data source template schemas 494

Amazon Kendra Developer Guide

},

"titleColumn": {

"type": "string"

},

"bodyColumn": {

"type": "string"

},

"sqlQuery": {

"type": "string",

"not": {

"pattern": ";+"

}

},

"timestampColumn": {

"type": "string"

},

"timestampFormat": {

"type": "string"

},

"timezone": {

"type": "string"

},

"changeDetectingColumns": {

"type": "array",

"items": {

"type": "string"

}

},

"allowedUsersColumn": {

"type": "string"

},

"allowedGroupsColumn": {

"type": "string"

},

"sourceURIColumn": {

"type": "string"

},

"isSslEnabled": {

"type": "boolean"

}

},

"required": ["primaryKey", "titleColumn", "bodyColumn", "sqlQuery"]

},

"type" : {

Data source template schemas 495

Amazon Kendra Developer Guide

"type" : "string",

"pattern": "JDBC"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Salesforce template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the Salesforce host URL as a part of the connection conﬁguration or repository

endpoint details. Also specify the type of data source as SALESFORCEV2, a secret for your

authentication credentials, and other necessary conﬁgurations. You then specify TEMPLATE as the

Type when you call CreateDataSource.

You can use the template provided in this developer guide. See Salesforce JSON schema.

Data source template schemas 496

Amazon Kendra Developer Guide

The following table describes the parameters of the Salesforce JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

hostUrl The URL of the Salesforce instance to be

indexed.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• account

• contact

• campaign

• case

• product

• lead

• contract

• partner

• proﬁle

• idea

• pricebook

• task

• solution

• attachment

• user

• document

• knowledgeArticles

• group

A list of objects that map the attributes or

ﬁeld names of your Salesforce entities to

Amazon Kendra index ﬁeld names. For more

information, see Mapping data source ﬁelds.

Data source template schemas 497

Amazon Kendra Developer Guide

Conﬁguration Description

• opportunity

• chatter

• customEntity

secretARN The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the

key-value pairs required to connect to your

Salesforce. The secret must contain a JSON

structure with the following keys:

{

"authenticationUrl": " OAUTH

endpoint that Amazon Kendra connects

to get an OAUTH token",

"consumerKey": " Application

public key generated when you created

your Salesforce application ",

"consumerSecret": " Application

private key generated when you created

your Salesforce application ",

"password": " Password associate

d with the user logging in to the

Salesforce instance ",

"securityToken": " Token associate

d with the user account logging in to

the Salesforce instance ",

"username": " User name of the

user logging in to the Salesforce

instance"

}

additionalProperties Additional conﬁguration options for your

content in your data source

Data source template schemas 498

Amazon Kendra Developer Guide

Conﬁguration Description

• accountFilter

• contactFilter

• caseFilter

• campaignFilter

• contractFilter

• groupFilter

• leadFilter

• productFilter

• opportunityFilter

• partnerFilter

• pricebookFilter

• ideaFilter

• proﬁleFilter

• taskFilter

• solutionFilter

• userFilter

• chatterFilter

• documentFilter

• knowledgeArticleFilter

• customEntities

A collection of strings that speciﬁes which

entities to ﬁlter.

Data source template schemas 499

Amazon Kendra Developer Guide

Conﬁguration Description

inclusionPatterns

• inclusionDocumentFileTypePatterns

• inclusionDocumentFileNamePatterns

• inclusionAccountFileTypePatterns

• inclusionCampaignFileTypePatterns

• inclusionDocumentFileNamePatterns

• inclusionCampaignFileNamePatterns

• inclusionCaseFileTypePatterns

• inclusionCaseFileNamePatterns

• inclusionContactFileTypePatterns

• inclusionContractFileNamePatterns

• inclusionLeadFileTypePatterns

• inclusionLeadFileNamePatterns

• inclusionOpportunityFileTypePatterns

• inclusionOpportunityFileNamePatterns

• inclusionSolutionFileTypePatterns

• inclusionSolutionFileNamePatterns

• inclusionTaskFileTypePatterns

• inclusionTaskFileNamePatterns

• inclusionGroupFileTypePatterns

• inclusionGroupFileNamePatterns

• inclusionChatterFileTypePatterns

• inclusionChatterFileNamePatterns

• inclusionCustomEntityFileTypePatterns

• inclusionCustomEntityFileNamePatterns

A list of regular expression patterns to include

certain ﬁles in your Salesforce data source.

Files that match the patterns are included in

the index. Files that don't match the patterns

are excluded from the index. If a ﬁle matches

both an inclusion and exclusion pattern, the

exclusion pattern takes precedence and the

ﬁle isn't included in the index.

Data source template schemas 500

Amazon Kendra Developer Guide

Conﬁguration Description

exclusionPatterns

• exclusionDocumentFileTypePatterns

• exclusionDocumentFileNamePatterns

• exclusionAccountFileTypePatterns

• exclusionCampaignFileTypePatterns

• exclusionCampaignFileNamePatterns

• exclusionCaseFileTypePatterns

• exclusionCaseFileNamePatterns

• exclusionContactFileTypePatterns

• exclusionContractFileNamePatterns

• exclusionLeadFileTypePatterns

• exclusionLeadFileNamePatterns

• exclusionOpportunityFileTypePatterns

• exclusionOpportunityFileNamePatterns

• exclusionSolutionFileTypePatterns

• exclusionSolutionFileNamePatterns

• exclusionTaskFileTypePatterns

• exclusionTaskFileNamePatterns

• exclusionGroupFileTypePatterns

• exclusionGroupFileNamePatterns

• exclusionChatterFileTypePatterns

• exclusionChatterFileNamePatterns

• exclusionCustomEntityFileTypePatterns

• exclusionCustomEntityFileNamePatterns

A list of regular expression patterns to exclude

certain ﬁles in your Salesforce data source.

Files that match the patterns are excluded

from the index. Files that don't match the

patterns are included in the index. If a ﬁle

matches both an exclusion and inclusion

pattern, the exclusion pattern takes precedenc

e and the ﬁle isn't included in the index.

Data source template schemas 501

Amazon Kendra Developer Guide

Conﬁguration Description

• isCrawlAccount

• isCrawlContact

• isCrawlCase

• isCrawlCampaign

• isCrawlProduct

• isCrawlLead

• isCrawlContract

• isCrawlPartner

• isCrawlProﬁle

• isCrawlIdea

• isCrawlPricebook

• isCrawlDocument

• crawlSharedDocument

• isCrawlGroup

• isCrawlOpportunity

• isCrawlChatter

• isCrawlUser

• isCrawlSolution

• isCrawlTask

• isCrawlAccountAttachments

• isCrawlContactAttachments

• isCrawlCaseAttachments

• isCrawlCampaignAttachments

• isCrawlLeadAttachments

• isCrawlContractAttachments

• isCrawlGroupAttachments

• isCrawlOpportunityAttachments

• isCrawlChatterAttachments

• isCrawlSolutionAttachments

true to crawl these types of ﬁles in your

Salesforce account.

Data source template schemas 502

Amazon Kendra Developer Guide

Conﬁguration Description

• isCrawlTaskAttachments

• isCrawlCustomEntityAttachments

• isCrawlKnowledgeArticles

• isCrawlDraft

• isCrawlPublish

• isCrawlArchived

type

The type of data source. Specify SALESFORC

EV2 as your data source type.

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

Data source template schemas 503

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

version The version of this template that is currently

supported.

Salesforce JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties":

{

"connectionConfiguration": {

"type": "object",

"properties":

{

Data source template schemas 504

Amazon Kendra Developer Guide

"repositoryEndpointMetadata":

{

"type": "object",

"properties":

{

"hostUrl":

{

"type": "string",

"pattern": "https:.*"

}

},

"required":

[

"hostUrl"

]

}

},

"required":

[

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties":

{

"account":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

Data source template schemas 505

Amazon Kendra Developer Guide

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"contact":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

Data source template schemas 506

Amazon Kendra Developer Guide

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

Data source template schemas 507

Amazon Kendra Developer Guide

},

"campaign":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

Data source template schemas 508

Amazon Kendra Developer Guide

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"case":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

Data source template schemas 509

Amazon Kendra Developer Guide

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"product":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

Data source template schemas 510

Amazon Kendra Developer Guide

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"lead":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

Data source template schemas 511

Amazon Kendra Developer Guide

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"contract":

{

"type": "object",

"properties":

{

Data source template schemas 512

Amazon Kendra Developer Guide

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

Data source template schemas 513

Amazon Kendra Developer Guide

[

"fieldMappings"

]

},

"partner":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

Data source template schemas 514

Amazon Kendra Developer Guide

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"profile":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

Data source template schemas 515

Amazon Kendra Developer Guide

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"idea":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

Data source template schemas 516

Amazon Kendra Developer Guide

[

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"pricebook":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

Data source template schemas 517

Amazon Kendra Developer Guide

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"task":

{

"type": "object",

Data source template schemas 518

Amazon Kendra Developer Guide

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

Data source template schemas 519

Amazon Kendra Developer Guide

},

"required":

[

"fieldMappings"

]

},

"solution":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

Data source template schemas 520

Amazon Kendra Developer Guide

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"attachment":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

Data source template schemas 521

Amazon Kendra Developer Guide

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"user":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

Data source template schemas 522

Amazon Kendra Developer Guide

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"document":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

Data source template schemas 523

Amazon Kendra Developer Guide

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE",

"LONG"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

Data source template schemas 524

Amazon Kendra Developer Guide

"knowledgeArticles":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

Data source template schemas 525

Amazon Kendra Developer Guide

}

]

}

},

"required":

[

"fieldMappings"

]

},

"group":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

Data source template schemas 526

Amazon Kendra Developer Guide

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"opportunity":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE",

Data source template schemas 527

Amazon Kendra Developer Guide

"LONG"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"chatter":

{

"type": "object",

"properties":

{

"fieldMappings":

{

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

Data source template schemas 528

Amazon Kendra Developer Guide

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

]

},

"customEntity":

{

"type": "object",

"properties":

{

"fieldMappings":

{

Data source template schemas 529

Amazon Kendra Developer Guide

"type": "array",

"items":

[

{

"type": "object",

"properties":

{

"indexFieldName":

{

"type": "string"

},

"indexFieldType":

{

"type": "string",

"enum":

[

"STRING",

"STRING_LIST",

"DATE"

]

},

"dataSourceFieldName":

{

"type": "string"

},

"dateFieldFormat":

{

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required":

[

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required":

[

"fieldMappings"

Data source template schemas 530

Amazon Kendra Developer Guide

]

}

},

"additionalProperties": {

"type": "object",

"properties":

{

"accountFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"contactFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"caseFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"campaignFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"contractFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"groupFilter":{

Data source template schemas 531

Amazon Kendra Developer Guide

"type": "array",

"items":

{

"type": "string"

}

},

"leadFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"productFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"opportunityFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"partnerFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"pricebookFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"ideaFilter":{

"type": "array",

"items":

Data source template schemas 532

Amazon Kendra Developer Guide

{

"type": "string"

}

},

"profileFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"taskFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"solutionFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"userFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"chatterFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"documentFilter":{

"type": "array",

"items":

{

"type": "string"

Data source template schemas 533

Amazon Kendra Developer Guide

}

},

"knowledgeArticleFilter":{

"type": "array",

"items":

{

"type": "string"

}

},

"customEntities":{

"type": "array",

"items":

{

"type": "string"

}

},

"isCrawlAccount": {

"type": "boolean"

},

"isCrawlContact": {

"type": "boolean"

},

"isCrawlCase": {

"type": "boolean"

},

"isCrawlCampaign": {

"type": "boolean"

},

"isCrawlProduct": {

"type": "boolean"

},

"isCrawlLead": {

"type": "boolean"

},

"isCrawlContract": {

"type": "boolean"

},

"isCrawlPartner": {

"type": "boolean"

},

"isCrawlProfile": {

"type": "boolean"

},

"isCrawlIdea": {

Data source template schemas 534

Amazon Kendra Developer Guide

"type": "boolean"

},

"isCrawlPricebook": {

"type": "boolean"

},

"isCrawlDocument": {

"type": "boolean"

},

"crawlSharedDocument": {

"type": "boolean"

},

"isCrawlGroup": {

"type": "boolean"

},

"isCrawlOpportunity": {

"type": "boolean"

},

"isCrawlChatter": {

"type": "boolean"

},

"isCrawlUser": {

"type": "boolean"

},

"isCrawlSolution":{

"type": "boolean"

},

"isCrawlTask":{

"type": "boolean"

},

"isCrawlAccountAttachments": {

"type": "boolean"

},

"isCrawlContactAttachments": {

"type": "boolean"

},

"isCrawlCaseAttachments": {

"type": "boolean"

},

"isCrawlCampaignAttachments": {

"type": "boolean"

},

"isCrawlLeadAttachments": {

"type": "boolean"

Data source template schemas 535

Amazon Kendra Developer Guide

},

"isCrawlContractAttachments": {

"type": "boolean"

},

"isCrawlGroupAttachments": {

"type": "boolean"

},

"isCrawlOpportunityAttachments": {

"type": "boolean"

},

"isCrawlChatterAttachments": {

"type": "boolean"

},

"isCrawlSolutionAttachments":{

"type": "boolean"

},

"isCrawlTaskAttachments":{

"type": "boolean"

},

"isCrawlCustomEntityAttachments":{

"type": "boolean"

},

"isCrawlKnowledgeArticles": {

"type": "object",

"properties":

{

"isCrawlDraft": {

"type": "boolean"

},

"isCrawlPublish": {

"type": "boolean"

},

"isCrawlArchived": {

"type": "boolean"

}

},

"inclusionDocumentFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

Data source template schemas 536

Amazon Kendra Developer Guide

"exclusionDocumentFileTypePatterns": {

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionDocumentFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionDocumentFileNamePatterns": {

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionAccountFileTypePatterns": {

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionAccountFileTypePatterns": {

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionAccountFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionAccountFileNamePatterns":{

"type": "array",

Data source template schemas 537

Amazon Kendra Developer Guide

"items":

{

"type": "string"

}

},

"inclusionCampaignFileTypePatterns": {

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionCampaignFileTypePatterns": {

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionCampaignFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionCampaignFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionCaseFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionCaseFileTypePatterns":{

"type": "array",

"items":

{

Data source template schemas 538

Amazon Kendra Developer Guide

"type": "string"

}

},

"inclusionCaseFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionCaseFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionContactFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionContactFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionContactFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionContactFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

Data source template schemas 539

Amazon Kendra Developer Guide

},

"inclusionContractFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionContractFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionContractFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionContractFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionLeadFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionLeadFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionLeadFileNamePatterns":{

Data source template schemas 540

Amazon Kendra Developer Guide

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionLeadFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionOpportunityFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionOpportunityFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionOpportunityFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionOpportunityFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionSolutionFileTypePatterns":{

"type": "array",

"items":

Data source template schemas 541

Amazon Kendra Developer Guide

{

"type": "string"

}

},

"exclusionSolutionFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionSolutionFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionSolutionFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionTaskFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionTaskFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionTaskFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

Data source template schemas 542

Amazon Kendra Developer Guide

}

},

"exclusionTaskFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionGroupFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionGroupFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionGroupFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionGroupFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionChatterFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

Data source template schemas 543

Amazon Kendra Developer Guide

"exclusionChatterFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionChatterFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionChatterFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionCustomEntityFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionCustomEntityFileTypePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"inclusionCustomEntityFileNamePatterns":{

"type": "array",

"items":

{

"type": "string"

}

},

"exclusionCustomEntityFileNamePatterns":{

"type": "array",

Data source template schemas 544

Amazon Kendra Developer Guide

"items":

{

"type": "string"

}

},

"required":

[]

},

"enableIdentityCrawler": {

"type": "boolean"

},

"type": {

"type": "string",

"pattern": "SALESFORCEV2"

},

"syncMode": {

"type": "string",

"enum": [

"FULL_CRAWL",

"FORCED_FULL_CRAWL",

"CHANGE_LOG"

]

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

Data source template schemas 545

Amazon Kendra Developer Guide

"type"

]

}

ServiceNow template schema

You include a JSON that contains the data source schema as part of the TemplateConﬁguration

object. You provide the ServiceNow host URL, authentication type, and instance version as a part of

the connection conﬁguration or repository endpoint details. Also specify the type of data source as

SERVICENOWV2, a secret for your authentication credentials, and other necessary conﬁgurations.

You then specify TEMPLATE as the Type when you call CreateDataSource.

You can use the template provided in this developer guide. See ServiceNow JSON schema.

The following table describes the parameters of the ServiceNow JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint for the

data source.

repositoryEndpointMetadata The endpoint information for the data source.

hostUrl

The ServiceNow host URL. For example, your-doma

in.service-now.com .

authType The type of authentication that you use, whether

basicAuth or OAuth2.

servicenowInstanceVersion The ServiceNow version that you use. You can choose

between Tokyo, Sandiego, Rome, and Others.

repositoryConﬁgurations Conﬁguration information for the content of the data

source. For example, conﬁguring speciﬁc types of

content and ﬁeld mappings.

• knowledgeArticle

• attachment

• serviceCatalog

A list of objects that map the attributes or ﬁeld names

of your ServiceNow knowledge articles, attachments,

service catalog, and incidents to Amazon Kendra index

ﬁeld names. For more information, see Mapping data

Data source template schemas 546

Amazon Kendra Developer Guide

Conﬁguration Description

• incident source ﬁelds. The ServiceNow data source ﬁeld names

must exist in your ServiceNow custom metadata.

additional properties Additional conﬁguration options for your content in

your data source.

maxFileSizeInMegaBytes Specify the ﬁle size limit in MBs that Amazon Kendra

will crawl. Amazon Kendra will crawl only the ﬁles

within the size limit you deﬁne. The default ﬁle size is

50MB. The maximum ﬁle size should be greater than

0MB and less than or equal to 50MB.

• knowledgeArticleFilter

• incidentQueryFilter

• serviceCatalogQueryFilter

• knowledgeArticleTitleRegExp

• serviceCatalogTitleRegExp

• incidentTitleRegExp

• inclusionFileTypePatterns

• exclusionFileTypePatterns

• inclusionFileNamePatterns

• exclusionFileNamePatterns

• incidentStateType

A list of regular expression patterns to include and/or

exclude certain ﬁles in your ServiceNow data source.

Files that match the patterns are included in the

index. Files that don't match the patterns are excluded

from the index. If a ﬁle matches both an inclusion

and exclusion pattern, the exclusion pattern takes

precedence and the ﬁle isn't included in the index.

Data source template schemas 547

Amazon Kendra Developer Guide

Conﬁguration Description

• isCrawlKnowledgeArticle

• isCrawlKnowledgeArticleAtta

chment

• includePublicArticlesOnly

• isCrawlServiceCatalog

• isCrawlServiceCatalogAttachment

• isCrawlActiveServiceCatalog

• isCrawlInactiveServiceCatalog

• isCrawlIncident

• isCrawlIncidentAttachment

• isCrawlActiveIncident

• isCrawlInactiveIncident

• applyACLForKnowledgeArticle

• applyACLForServiceCatalog

• applyACLForIncident

true to crawl ServiceNow knowledge articles, service

catalogs, incidents, and attachments.

type

The type of data source. Specify SERVICENOWV2 as

your data source type.

enableIdentityCrawler

true to use Amazon Kendra's identity crawler to sync

identity/principal information on users and groups

with access to certain documents. If identity crawler is

turned oﬀ, all documents can be publicly searched. If

you want to use access control for your documents and

identity crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user and group

access information.

Data source template schemas 548

Amazon Kendra Developer Guide

Conﬁguration Description

syncMode Specify how Amazon Kendra should update your index

when your data source content changes. You can

choose between:

•

FORCED_FULL_CRAWL to freshly index all content,

replacing existing content each time your data source

syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed and

deleted content each time your data source syncs

with your index. Amazon Kendra can use your data

source's mechanism for tracking content changes and

index content that changed since the last sync.

secretARN The Amazon Resource Name (ARN) of an AWS Secrets

Manager secret that contains the key-value pairs

required to connect to your ServiceNow. The secret

must contain a JSON structure with the following keys:

{

"username": " user name",

"password": " password"

}

If you use OAuth2 authentication, your secret must

contain a JSON structure with the following keys:

{

"username": " user name",

"password": " password",

"clientId": " client id",

"clientSecret": " client secret"

}

version The version of the template that is currently supported.

Data source template schemas 549

Amazon Kendra Developer Guide

ServiceNow JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"hostUrl": {

"type": "string",

"pattern": "^(?!(^(https?|ftp|file):\/\/))[a-z0-9-]+(.service-

now.com|.servicenowservices.com)$",

"minLength": 1,

"maxLength": 2048

},

"authType": {

"type": "string",

"enum": [

"basicAuth",

"OAuth2"

]

},

"servicenowInstanceVersion": {

"type": "string",

"enum": [

"Tokyo",

"Sandiego",

"Rome",

"Others"

]

}

},

"required": [

"hostUrl",

"authType",

"servicenowInstanceVersion"

]

}

},

"required": [

Data source template schemas 550

Amazon Kendra Developer Guide

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"knowledgeArticle": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"STRING_LIST"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

Data source template schemas 551

Amazon Kendra Developer Guide

"fieldMappings"

]

},

"attachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"LONG",

"DATE",

"STRING_LIST"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

Data source template schemas 552

Amazon Kendra Developer Guide

},

"serviceCatalog": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"STRING_LIST"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"incident": {

"type": "object",

Data source template schemas 553

Amazon Kendra Developer Guide

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": [

"STRING",

"DATE",

"STRING_LIST"

]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"additionalProperties": {

"type": "object",

"properties": {

Data source template schemas 554

Amazon Kendra Developer Guide

"maxFileSizeInMegaBytes": {

"type": "string"

},

"isCrawlKnowledgeArticle": {

"type": "boolean"

},

"isCrawlKnowledgeArticleAttachment": {

"type": "boolean"

},

"includePublicArticlesOnly": {

"type": "boolean"

},

"knowledgeArticleFilter": {

"type": "string"

},

"incidentQueryFilter": {

"type": "string"

},

"serviceCatalogQueryFilter": {

"type": "string"

},

"isCrawlServiceCatalog": {

"type": "boolean"

},

"isCrawlServiceCatalogAttachment": {

"type": "boolean"

},

"isCrawlActiveServiceCatalog": {

"type": "boolean"

},

"isCrawlInactiveServiceCatalog": {

"type": "boolean"

},

"isCrawlIncident": {

"type": "boolean"

},

"isCrawlIncidentAttachment": {

"type": "boolean"

},

"isCrawlActiveIncident": {

"type": "boolean"

},

"isCrawlInactiveIncident": {

"type": "boolean"

Data source template schemas 555

Amazon Kendra Developer Guide

},

"applyACLForKnowledgeArticle": {

"type": "boolean"

},

"applyACLForServiceCatalog": {

"type": "boolean"

},

"applyACLForIncident": {

"type": "boolean"

},

"incidentStateType": {

"type": "array",

"items": {

"type": "string",

"enum": [

"Open",

"Open - Unassigned",

"Resolved",

"All"

]

}

},

"knowledgeArticleTitleRegExp": {

"type": "string"

},

"serviceCatalogTitleRegExp": {

"type": "string"

},

"incidentTitleRegExp": {

"type": "string"

},

"inclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileTypePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionFileNamePatterns": {

Data source template schemas 556

Amazon Kendra Developer Guide

"type": "array",

"items": {

"type": "string"

}

},

"exclusionFileNamePatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"required": []

},

"type": {

"type": "string",

"pattern": "SERVICENOWV2"

},

"enableIdentityCrawler": {

"type": "boolean"

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL"

]

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

Data source template schemas 557

Amazon Kendra Developer Guide

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type"

]

}

Slack template schema

You include a JSON that contains the data source schema as part of TemplateConﬁguration

object. You provide the host URL as a part of the connection conﬁguration or repository endpoint

details. Also specify the type of data source as SLACK, a secret for your authentication credentials,

and other necessary conﬁgurations. You then specify TEMPLATE as the Type when you call

CreateDataSource.

You can use the template provided in this developer guide. See Slack JSON schema.

The following table describes the parameters of the Slack JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

teamId The Slack team ID you copied from your Slack

main page URL.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

All A list of objects that map the attributes or

ﬁeld names of your Slack content to Amazon

Kendra index ﬁeld names.

additionalProperties Additional conﬁguration options for your

content in your data source.

Data source template schemas 558

Amazon Kendra Developer Guide

Conﬁguration Description

inclusionPatterns A list of regular expression patterns to

include speciﬁc content in your Slack data

source. Content that matches the patterns

are included in the index. Content that

doesn't match the patterns are excluded from

the index. If any content matches both an

inclusion and exclusion pattern, the exclusion

pattern takes precedence, and the content

isn't included in the index.

exclusionPatterns A list of regular expression patterns to exclude

speciﬁc content in your Slack data source.

Content that matches the patterns are

excluded from the index. Content that doen't

match the patterns are included in the index.

If any content matches both an inclusion and

exclusion pattern, the exclusion pattern takes

precedence, and the content isn't included in

the index.

crawlBotMessages

true to crawl bot messages.

excludeArchived

true to exclude crawling of archived

messages.

conversationType The type of conversation that you want

to index whether PUBLIC_CHANNEL ,

PRIVATE_CHANNEL , GROUP_MESSAGE and

DIRECT_MESSAGE .

channelFilter The type of channel that you want to index

whether private_channel or public_ch

annel .

Data source template schemas 559

Amazon Kendra Developer Guide

Conﬁguration Description

sinceDate

You can choose to conﬁgure a sinceDate

parameter so that the Slack connector crawls

content based on a speciﬁc sinceDate .

lookBack

You can choose to conﬁgure a lookBack

parameter so that the Slack connector crawls

updated or deleted content upto a speciﬁed

number of hours before your last connector

sync.

syncMode Specify how Amazon Kendra should update

your index when your data source content

changes. You can choose between:

•

FORCED_FULL_CRAWL to freshly index all

content, replacing existing content each

time your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed

and deleted content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

•

CHANGE_LOG to index only new and

modiﬁed content each time your data

source syncs with your index. Amazon

Kendra can use your data source's

mechanism for tracking content changes

and index content that changed since the

last sync.

type

The type of data source. Specify SLACK as

your data source type.

Data source template schemas 560

Amazon Kendra Developer Guide

Conﬁguration Description

enableIdentityCrawler

true to use Amazon Kendra's identity crawler

to sync identity/principal information on users

and groups with access to certain documents.

If identity crawler is turned oﬀ, all documents

can be publicly searched. If you want to use

access control for your documents and identity

crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user

and group access information.

secretArn The Amazon Resource Name (ARN) of an AWS

Secrets Manager secret that contains the key-

value pairs required to connect to your Slack.

The secret must contain a JSON structure with

the following keys:

{

"slackToken": " token"

}

version The version of this template that's currently

supported.

Slack JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"teamId": {

Data source template schemas 561

Amazon Kendra Developer Guide

"type": "string"

}

},

"required": ["teamId"]

}

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"All": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "DATE","LONG"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "yyyy-MM-dd'T'HH:mm:ss'Z'"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

Data source template schemas 562

Amazon Kendra Developer Guide

"fieldMappings"

]

}

},

"required": [

]

},

"additionalProperties": {

"type": "object",

"properties": {

"exclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"inclusionPatterns": {

"type": "array",

"items": {

"type": "string"

}

},

"crawlBotMessages": {

"type": "boolean"

},

"excludeArchived": {

"type": "boolean"

},

"conversationType": {

"type": "array",

"items": {

"type": "string",

"enum": [

"PUBLIC_CHANNEL",

"PRIVATE_CHANNEL",

"GROUP_MESSAGE",

"DIRECT_MESSAGE"

]

}

},

"channelFilter": {

"type": "object",

"properties": {

"private_channel": {

Data source template schemas 563

Amazon Kendra Developer Guide

"type": "array",

"items": {

"type": "string"

}

},

"public_channel": {

"type": "array",

"items": {

"type": "string"

}

},

"channelIdFilter": {

"type": "array",

"items": {

"type": "string"

}

},

"sinceDate": {

"anyOf": [

{

"type": "string",

"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$"

},

{

"type": "string",

"pattern": ""

}

]

},

"lookBack": {

"type": "string",

"pattern": "^[0-9]*$"

}

},

"required": [

]

},

"syncMode": {

"type": "string",

"enum": [

"FORCED_FULL_CRAWL",

"FULL_CRAWL",

Data source template schemas 564

Amazon Kendra Developer Guide

"CHANGE_LOG"

]

},

"type" : {

"type" : "string",

"pattern": "SLACK"

},

"enableIdentityCrawler": {

"type": "boolean"

},

"secretArn": {

"type": "string"

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

},

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"syncMode",

"additionalProperties",

"secretArn",

"type",

"enableIdentityCrawler"

]

}

Zendesk template schema

You include a JSON that contains the data source schema as part of TemplateConﬁguration object.

You provide the host URL as a part of the connection conﬁguration or repository endpoint details.

Also specify the type of data source as ZENDESK, a secret for your authentication credentials,

and other necessary conﬁgurations. You then specify TEMPLATE as the Type when you call

CreateDataSource.

You can use the template provided in this developer guide. See Zendesk JSON schema.

Data source template schemas 565

Amazon Kendra Developer Guide

The following table describes the parameters of the Zendesk JSON schema.

Conﬁguration Description

connectionConﬁguration Conﬁguration information for the endpoint

for the data source.

repositoryEndpointMetadata The endpoint information for the data source.

hostURL The Zendesk host URL. For example, https://y

oursubdomain.zendesk.com.

repositoryConﬁgurations Conﬁguration information for the content

of the data source. For example, conﬁguring

speciﬁc types of content and ﬁeld mappings.

• ticket

• ticketComment

• ticketCommentAttachment

• article

• articleComment

• articleAttachment

• communityTopic

• communityPostComment

A list of objects that map attributes or ﬁeld

names of Zendesk tickets to Amazon Kendra

index ﬁeld names. For more information, see

Mapping data source ﬁelds.

secretARN The Amazon Resource Name (ARN) of an

AWS Secrets Manager secret that contains

the key-value pairs required to connect to

your Zendesk. The secret must contain a

JSON structure with the following keys: host

URL, client ID, client secret, user name, and

password.

additionalProperties Additional conﬁguration options for your

content in your data source

Data source template schemas 566

Amazon Kendra Developer Guide

Conﬁguration Description

organizationNameFilter You can choose to index tickets that exist

within a speciﬁc Organization.

sinceDate

You can choose to conﬁgure a sinceDate

parameter so that the Zendesk connector

crawls content based on a speciﬁc sinceDate

.

inclusionPatterns A list of regular expression patterns to include

certain ﬁles in your Zendesk data source. Files

that match the patterns are included in the

index. Files that don't match the patterns are

excluded from the index. If a ﬁle matches

both an inclusion and exclusion pattern, the

exclusion pattern takes precedence, and the

ﬁle isn't included in the index.

exclusionPatterns A list of regular expression patterns to exclude

certain ﬁles in your Zendesk data source. Files

that match the patterns are excluded from the

index. Files that don't match the patterns are

included in the index. If a ﬁle matches both an

exclusion and inclusion pattern, the exclusion

pattern takes precedence, and the ﬁle isn't

included in the index.

Data source template schemas 567

Amazon Kendra Developer Guide

Conﬁguration Description

• isCrawlTicket

• isCrawlTicketComment

• isCrawlTicketCommentAttachment

• isCrawlArticle

• isCrawlArticleComment

• isCrawlArticleAttachment

• isCrawlCommunityTopic

• isCrawlCommunityPost

• isCrawlCommunityPostComment

Input "true" to crawl these types of content.

type

Specify ZENDESK as your data source type.

useChangeLog

Input "true" to use the Zendesk change log to

determine which documents require updating

in the index. Depending on the change log's

size, it might be faster to scan the documents

in Zendesk. If you are syncing your Zendesk

data source with your index for the ﬁrst time,

all documents are scanned.

Zendesk JSON schema

{

"$schema": "http://json-schema.org/draft-04/schema#",

"type": "object",

"properties": {

"connectionConfiguration": {

"type": "object",

"properties": {

"repositoryEndpointMetadata": {

"type": "object",

"properties": {

"hostUrl": {

"type": "string",

"pattern": "https:.*"

Data source template schemas 568

Amazon Kendra Developer Guide

}

},

"required": [

"hostUrl"

]

}

},

"required": [

"repositoryEndpointMetadata"

]

},

"repositoryConfigurations": {

"type": "object",

"properties": {

"ticket": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "LONG", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

Data source template schemas 569

Amazon Kendra Developer Guide

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"ticketComment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "LONG", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

Data source template schemas 570

Amazon Kendra Developer Guide

}

},

"required": [

"fieldMappings"

]

},

"ticketCommentAttachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "LONG", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

Data source template schemas 571

Amazon Kendra Developer Guide

},

"article": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "LONG", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"communityPostComment": {

"type": "object",

"properties": {

"fieldMappings": {

Data source template schemas 572

Amazon Kendra Developer Guide

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "LONG", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"articleComment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

Data source template schemas 573

Amazon Kendra Developer Guide

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "LONG", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"articleAttachment": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

Data source template schemas 574

Amazon Kendra Developer Guide

"type": "string",

"enum": ["STRING", "STRING_LIST", "LONG", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

},

"dateFieldFormat": {

"type": "string",

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

},

"communityTopic": {

"type": "object",

"properties": {

"fieldMappings": {

"type": "array",

"items": {

"anyOf": [

{

"type": "object",

"properties": {

"indexFieldName": {

"type": "string"

},

"indexFieldType": {

"type": "string",

"enum": ["STRING", "STRING_LIST", "LONG", "DATE"]

},

"dataSourceFieldName": {

"type": "string"

Data source template schemas 575

Amazon Kendra Developer Guide

},

"dateFieldFormat": {

"type": "string",

"pattern": "dd-MM-yyyy HH:mm:ss"

}

},

"required": [

"indexFieldName",

"indexFieldType",

"dataSourceFieldName"

]

}

]

}

},

"required": [

"fieldMappings"

]

}

},

"secretArn": {

"type": "string",

"minLength": 20,

"maxLength": 2048

},

"additionalProperties": {

"type": "object",

"properties": {

"organizationNameFilter": {

"type": "array"

},

"sinceDate": {

"type": "string",

"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}$"

},

"inclusionPatterns": {

"type": "array"

},

"exclusionPatterns": {

"type": "array"

},

"isCrawTicket": {

Data source template schemas 576

Amazon Kendra Developer Guide

"type": "string"

},

"isCrawTicketComment": {

"type": "string"

},

"isCrawTicketCommentAttachment": {

"type": "string"

},

"isCrawlArticle": {

"type": "string"

},

"isCrawlArticleAttachment": {

"type": "string"

},

"isCrawlArticleComment": {

"type": "string"

},

"isCrawlCommunityTopic": {

"type": "string"

},

"isCrawlCommunityPost": {

"type": "string"

},

"isCrawlCommunityPostComment": {

"type": "string"

}

},

"type": {

"type": "string",

"pattern": "ZENDESK"

},

"useChangeLog": {

"type": "string",

"enum": ["true", "false"]

}

},

"version": {

"type": "string",

"anyOf": [

{

"pattern": "1.0.0"

}

]

Data source template schemas 577

Amazon Kendra Developer Guide

},

"additionalProperties": false,

"required": [

"connectionConfiguration",

"repositoryConfigurations",

"additionalProperties",

"useChangeLog",

"secretArn",

"type"

]

}

Adobe Experience Manager

Adobe Experience Manager is a content management system that's used for creating website or

mobile app content. You can use Amazon Kendra to connect to Adobe Experience Manager and

index your pages and content assets.

Amazon Kendra supports Adobe Experience Manager (AEM) as a Cloud Service author instance and

Adobe Experience Manager On-Premise author and publish instance.

You can connect Amazon Kendra to your Adobe Experience Manager data source using the Amazon

Kendra console or the TemplateConﬁguration API.

For troubleshooting your Amazon Kendra Adobe Experience Manager data source connector, see

Troubleshooting data sources.

Topics

• Supported features

• Prerequisites

• Connection instructions

Supported features

Adobe Experience Manager data source connector supports the following features:

• Field mappings

• User access control

• Inclusion/exclusion ﬁlters

Adobe Experience Manager 578

Amazon Kendra Developer Guide

• Full and incremental content syncs

• OAuth 2.0 and basic authentication

• Virtual private cloud (VPC)

Prerequisites

Before you can use Amazon Kendra to index your Adobe Experience Manager data source, make

these changes in your Adobe Experience Manager and AWS accounts.

In Adobe Experience Manager, make sure you have:

• Access to an account with administrative privileges, or an admin user.

• Copied your Adobe Experience Manager host URL.

Note

(On-premise/server) Amazon Kendra checks if the endpoint information included in

AWS Secrets Manager is the same the endpoint information speciﬁed in your data source

conﬁguration details. This helps protect against the confused deputy problem, which

is a security issue where a user doesn’t have permission to perform an action but uses

Amazon Kendra as a proxy to access the conﬁgured secret and perform the action. If

you later change your endpoint information, you must create a new secret to sync this

information.

• Noted your basic authentication credentials of admin user name and password.

Note

We recommend that you regularly refresh or rotate your credentials and secret. Provide

only the necessary access level for your own security. We do not recommend that you

re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0

(where applicable).

• Optional: Conﬁgured OAuth 2.0 credentials in Adobe Experience Manager (AEM) as a Cloud

Service or AEM On-Premise. If you use AEM On-Premise, the credentials include client ID, client

secret, and private key. If you use AEM as a Cloud Service, the credentials include client ID,

client secret, private key, organization ID, technical account ID, and Adobe Identity Management

System (IMS) host. For more information about how to generate these credentials for AEM as

Adobe Experience Manager 579

Amazon Kendra Developer Guide

a Cloud Service, see Adobe Experience Manager documentation. For AEM On-Premise, Adobe

Granite OAuth 2.0 server implementation (com.adobe.granite.oauth.server) provides the support

for OAuth 2.0 server functionalities in AEM.

• Checked each document is unique in Adobe Experience Manager and across other data sources

you plan to use for the same index. Each data source that you want to use for an index must not

contain the same document across the data sources. Document IDs are global to an index and

must be unique per index.

In your AWS account, make sure you have:

• Created an Amazon Kendra index and, if using the API, noted the index ID.

• Created an IAM role for your data source and, if using the API, noted the ARN of the IAM role.

Note

If you change your authentication type and credentials, you must update your IAM role to

access the correct AWS Secrets Manager secret ID.

• Stored your Adobe Experience Manager authentication credentials in an AWS Secrets Manager

secret and, if using the API, noted the ARN of the secret.

Note

We recommend that you regularly refresh or rotate your credentials and secret. Provide

only the necessary access level for your own security. We do not recommend that you

re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0

(where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role

and Secrets Manager secret when you connect your Adobe Experience Manager data source to

Amazon Kendra. If you are using the API, you must provide the ARN of an existing IAM role and

Secrets Manager secret, and an index ID.

Connection instructions

To connect Amazon Kendra to your Adobe Experience Manager data source, you must provide

the necessary details of your Adobe Experience Manager data source so that Amazon Kendra can

Adobe Experience Manager 580

Amazon Kendra Developer Guide

access your data. If you have not yet conﬁgured Adobe Experience Manager for Amazon Kendra,

see Prerequisites.

Console

To connect Amazon Kendra to Adobe Experience Manager

1. Sign in to the AWS Management Console and open the Amazon Kendra console.

2. From the left navigation pane, choose Indexes and then choose the index you want to use

from the list of indexes.

Note

You can choose to conﬁgure or edit your User access control settings under Index

settings.

3. On the Getting started page, choose Add data source.

4. On the Add data source page, choose Adobe Experience Manager connector, and

then choose Add connector. If using version 2 (if applicable), choose Adobe Experience

Manager connector with the "V2.0" tag.

5. On the Specify data source details page, enter the following information:

a. In Name and description, for Data source name—Enter a name for your data source.

You can include hyphens but not spaces.

b. (Optional) Description—Enter an optional description for your data source.

c. In Default language—Choose a language to ﬁlter your documents for the index.

Unless you specify otherwise, the language defaults to English. Language speciﬁed in

the document metadata overrides the selected language.

d. In Tags, for Add new tag—Include optional tags to search and ﬁlter your resources or

track your AWS costs.

e. Choose Next.

6. On the Deﬁne access and security page, enter the following information:

a. Source—Choose either AEM On-Premise or AEM as a Cloud Service.

Enter your Adobe Experience Manager host URL. For example, if you use AEM On-

Premise, you include the hostname and port: https://hostname:port. Or, if you

Adobe Experience Manager 581

Amazon Kendra Developer Guide

use AEM as a Cloud Service, you can use the author URL: https://author-xxxxxx-

xxxxxxx.adobeaemcloud.com.

b. SSL certiﬁcate location—Enter the path to the SSL certiﬁcate stored in an Amazon S3

bucket. You use this to connect to AEM On-Premise with a secure SSL connection.

c. Authorization—Turn on or oﬀ access control list (ACL) information for your

documents, if you have an ACL and want to use it for access control. The ACL speciﬁes

which documents that users and groups can access. The ACL information is used to

ﬁlter search results based on the user or their group access to documents. For more

information, see User context ﬁltering.

d. Authentication—Choose Basic authentication or OAuth 2.0 authentication. Then

choose an existing AWS Secrets Manager secret or create a new secret to store your

Adobe Experience Manager credentials. If you choose to create a new secret, an AWS

Secrets Manager secret window opens.

If you chose Basic authentication, enter a name for the secret, the Adobe Experience

Manager site user name and password. The user must have admin permission or be an

admin user.

If you chose OAuth 2.0 authentication and you use AEM On-Premise, enter a name

for the secret, client ID, client secret, and private key. If you use AEM as a Cloud

Service, enter a name for the secret, client ID, client secret, private key, organization ID,

technical account ID, and Adobe Identity Management System (IMS) host.

Save an add your secret.

e. Virtual Private Cloud (VPC)—You can choose to use a VPC. If so, you must add

Subnets and VPC security groups.

f. Identity crawler—Specify whether to turn on Amazon Kendra’s identity crawler. The

identity crawler uses the access control list (ACL) information for your documents to

ﬁlter search results based on the user or their group access to documents. If you have

an ACL for your documents and choose to use your ACL, you can then also choose

to turn on Amazon Kendra’s identity crawler to conﬁgure user context ﬁltering of

search results. Otherwise, if identity crawler is turned oﬀ, all documents can be publicly

searched. If you want to use access control for your documents and identity crawler is

turned oﬀ, you can alternatively use the PutPrincipalMapping API to upload user and

group access information for user context ﬁltering.

Adobe Experience Manager 582

Amazon Kendra Developer Guide

g. IAM role—Choose an existing IAM role or create a new IAM role to access your

repository credentials and index content.

Note

IAM roles used for indexes cannot be used for data sources. If you are unsure if

an existing role is used for an index or FAQ, choose Create a new role to avoid

errors.

h. Choose Next.

7. On the Conﬁgure sync settings page, enter the following information:

a. Sync scope—Set limits for crawling certain content types, page components, and roots

paths, and ﬁlter content using regex expression patterns.

i. Content types—Choose whether to crawl only pages or assets, or both.

ii. (Optional) Additional conﬁguration—Conﬁgure the following settings:

• Page components—The speciﬁc names of page components. The Page

Component is an extensible page component designed to work with the Adobe

Experience Manager template editor and allows page header/footer and

structure components to be assembled with the template editor.

• Content fragment variations—The speciﬁc names of content fragment

variations. Content Fragments allow you to design, create, curate and publish

page-independent content in Adobe Experience Manager. They allow you to

prepare content ready for use in multiple locations/over multiple channels.

• Root paths—The root paths to speciﬁc content.

• Regex patterns—The regular expression patterns to include or exclude certain

pages and assets.

b. Sync mode—Choose how you want to update your index when your data source

content changes. When you sync your data source with Amazon Kendra for the ﬁrst

time, all content is crawled and indexed by default. You must run a full sync of your

data if your initial sync failed, even if you don't choose full sync as your sync mode

option.

• Full sync: Freshly index all content, replacing existing content each time your data

source syncs with your index.

Adobe Experience Manager 583

Amazon Kendra Developer Guide

• New, modiﬁed sync: Index only new and modiﬁed content each time your data

source syncs with your index. Amazon Kendra can use your data source's mechanism

for tracking content changes and index content that changed since the last sync.

• New, modiﬁed, deleted sync: Index only new, modiﬁed, and deleted content each

time your data source syncs with your index. Amazon Kendra can use your data

source's mechanism for tracking content changes and index content that changed

since the last sync.

c. Time zone ID—If you use AEM On-Premise and the time zone of your server is

diﬀerent than the time zone of the Amazon Kendra AEM connector or index, you can

specify the server time zone to align with the AEM connector or index. The default

time zone for AEM On-Premise is the time zone of the Amazon Kendra AEM connector

or index. The default time zone for AEM as a Cloud Service is Greenwich Mean Time.

d. Sync run schedule, for Frequency—Choose how often to sync your data source

content and update your index.

e. Choose Next.

8. On the Set ﬁeld mappings page, enter the following information:

a. Select from the Amazon Kendra generated default data source ﬁelds you want to map

to your index. To add custom data source ﬁelds, create an index ﬁeld name to map to

and the ﬁeld data type.

b. Choose Next.

9. On the Review and create page, check that the information you have entered is correct

and then select Add data source. You can also choose to edit your information from this

page. Your data source will appear on the Data sources page after the data source has been

added successfully.

API

To connect Amazon Kendra to Adobe Experience Manager

You must specify a JSON of the data source schema using the TemplateConﬁguration API. You

must provide the following information:

•

Data source—Specify the data source type as AEM when you use the TemplateConﬁguration

JSON schema. Also specify the data source as TEMPLATE when you call the CreateDataSource

API.

Adobe Experience Manager 584

Amazon Kendra Developer Guide

• AEM host URL—Specify the Adobe Experience Manager host URL. For example, if you

use AEM On-Premise, you include the hostname and port: https://hostname:port. Or,

if you use AEM as a Cloud Service, you can use the author URL: https://author-xxxxxx-

xxxxxxx.adobeaemcloud.com.

• Sync mode—Specify how Amazon Kendra should update your index when your data source

content changes. When you sync your data source with Amazon Kendra for the ﬁrst time, all

content is crawled and indexed by default. You must run a full sync of your data if your initial

sync failed, even if you don't choose full sync as your sync mode option. You can choose

between:

•

FORCED_FULL_CRAWL to freshly index all content, replacing existing content each time

your data source syncs with your index.

•

FULL_CRAWL to index only new, modiﬁed, and deleted content each time your data source

syncs with your index. Amazon Kendra can use your data source’s mechanism for tracking

content changes and index content that changed since the last sync.

•

CHANGE_LOG to index only new and modiﬁed content each time your data source syncs

with your index. Amazon Kendra can use your data source’s mechanism for tracking content

changes and index content that changed since the last sync.

•

Authentication type—Specify which type of authentication you want to use, either Basic or

OAuth2.

•

AEM type—Specify which type of Adobe Experience Manager you use, either CLOUD or

ON_PREMISE.

• Secret Amazon Resource Name (ARN)—If you want to use basic authentication for either

AEM On-Premise or Cloud, you provide a secret that stores your authentication credentials

of your user name and password. You provide the Amazon Resource Name (ARN) of an AWS

Secrets Manager secret. The secret is stored in a JSON structure with the following keys:

{

"aemUrl": "Adobe Experience Manager On-Premise host URL",

"username": "user name with admin permissions",

"password": "password with admin permissions"

}

If you want to use OAuth 2.0 authentication for AEM On-Premise, the secret is stored in a

JSON structure with the following keys:

{

Adobe Experience Manager 585

Amazon Kendra Developer Guide

"aemUrl": "Adobe Experience Manager host URL",

"clientId": "client ID",

"clientSecret": "client secret",

"privateKey": "private key"

}

If you want to use OAuth 2.0 authentication for AEM as a Cloud Service, the secret is stored in

a JSON structure with the following keys:

{

"clientId": "client ID",

"clientSecret": "client secret",

"privateKey": "private key",

"orgId": "organization ID",

"technicalAccountId": "technical account ID",

"imsHost": "Adobe Identity Management System (IMS) host"

}

•

IAM role—Specify RoleArn when you call CreateDataSource to provide an IAM role with

permissions to access your Secrets Manager secret and to call the required public APIs for the

Adobe Experience Manager connector and Amazon Kendra. For more information, see IAM

roles for Adobe Experience Manager data sources.

You can also add the following optional features:

•

Virtual Private Cloud (VPC)—Specify VpcConfiguration when you call

CreateDataSource. For more information, see Conﬁguring Amazon Kendra to use an

Amazon VPC.

• Time zone ID—If you use AEM On-Premise and the time zone of your server is diﬀerent than

the time zone of the Amazon Kendra AEM connector or index, you can specify the server time

zone to align with the AEM connector or index.

The default time zone for AEM On-Premise is the time zone of the Amazon Kendra AEM

connector or index. The default time zone for AEM as a Cloud Service is Greenwich Mean

Time.

For information about the supported time zones IDs, see Adobe Experience Manager JSON

schema.

Adobe Experience Manager 586

Amazon Kendra Developer Guide

• Inclusion and exclusion ﬁlters—Specify whether to include or exclude certain pages and

assets.

Note

Most data sources use regular expression patterns, which are inclusion or exclusion

patterns referred to as ﬁlters. If you specify an inclusion ﬁlter, only content that

matches the inclusion ﬁlter is indexed. Any document that doesn’t match the

inclusion ﬁlter isn’t indexed. If you specify an inclusion and exclusion ﬁlter, documents

that match the exclusion ﬁlter are not indexed, even if they match the inclusion ﬁlter.

• Identity crawler—Specify whether to turn on Amazon Kendra’s identity crawler. The identity

crawler uses the access control list (ACL) information for your documents to ﬁlter search

results based on the user or their group access to documents. If you have an ACL for your

documents and choose to use your ACL, you can then also choose to turn on Amazon

Kendra’s identity crawler to conﬁgure user context ﬁltering of search results. Otherwise,

if identity crawler is turned oﬀ, all documents can be publicly searched. If you want to use

access control for your documents and identity crawler is turned oﬀ, you can alternatively use

the PutPrincipalMapping API to upload user and group access information for user context

ﬁltering.

• Field mappings—Choose to map your Adobe Experience Manager data source ﬁelds to your

Amazon Kendra index ﬁelds. For more information, see Mapping data source ﬁelds.

Note

The document body ﬁeld or the document body equivalent for your documents

is required in order for Amazon Kendra to search your documents. You must

map your document body ﬁeld name in your data source to the index ﬁeld name

_document_body. All other ﬁelds are optional.

For a list of other important JSON keys to conﬁgure, see Adobe Experience Manager template

schema.

Adobe Experience Manager 587

Amazon Kendra Developer Guide

Alfresco

Alfresco is a content management service that helps customers store and manage their content.

You can use Amazon Kendra to index your Alfresco Document library, Wiki, and Blog.

Amazon Kendra supports Alfresco On-Premises and Alfresco Cloud (Platform as a Service).

You can connect Amazon Kendra to your Alfresco data source using the Amazon Kendra console or

the TemplateConﬁguration API.

For troubleshooting your Amazon Kendra Alfresco data source connector, see Troubleshooting data

sources.

Topics

• Supported features

• Prerequisites

• Connection instructions

• Learn more

Supported features

Amazon Kendra Alfresco data source connector supports the following features:

• Field mappings

• User access control

• Inclusion/exclusion ﬁlters

• Full and incremental content syncs

• OAuth 2.0 and basic authentication

• Virtual private cloud (VPC)

Prerequisites

Before you can use Amazon Kendra to index your Alfresco data source, make these changes in your

Alfresco and AWS accounts.

In Alfresco, make sure you have:

Alfresco 588

Amazon Kendra Developer Guide

• Copied your Alfresco repository URL and web application URL. If you only want to index a

speciﬁc Alfresco site, then also copy the site ID.

• Noted your Alfresco authentication credentials, which include a user name and password with at

least read permissions. If you want to use OAuth 2.0 authentication, you should add the user to

the Alfresco administrators group.

Note

We recommend that you regularly refresh or rotate your credentials and secret. Provide

only the necessary access level for your own security. We do not recommend that you

re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0

(where applicable).

• Optional: Conﬁgured OAuth 2.0 credentials in Alfresco. The credentials include client ID, client

secret, and token URL. For more information on how to conﬁgure clients for Alfresco On-

Premises, see Alfresco documentation. If you use Alfresco Cloud (PaaS), you must contact Hyland

support for Alfresco OAuth 2.0 authentication.

• Checked each document is unique in Alfresco and across other data sources you plan to use for

the same index. Each data source that you want to use for an index must not contain the same

document across the data sources. Document IDs are global to an index and must be unique per

index.

In your AWS account, make sure you have:

• Created an Amazon Kendra index and, if using the API, noted the index ID.

• Created an IAM role for your data source and, if using the API, noted the ARN of the IAM role.

Note

If you change your authentication type and credentials, you must update your IAM role to

access the correct AWS Secrets Manager secret ID.

• Stored your Alfresco authentication credentials in an AWS Secrets Manager secret and, if using

the API, noted the ARN of the secret.

Alfresco 589

Amazon Kendra Developer Guide

Note

We recommend that you regularly refresh or rotate your credentials and secret. Provide

only the necessary access level for your own security. We do not recommend that you

re-use credentials and secrets across data sources, and connector versions 1.0 and 2.0

(where applicable).

If you don’t have an existing IAM role or secret, you can use the console to create a new IAM role

and Secrets Manager secret when you connect your Alfresco data source to Amazon Kendra. If you

are using the API, you must provide the ARN of an existing IAM role and Secrets Manager secret,

and an index ID.

Connection instructions

To connect Amazon Kendra to your Alfresco data source, you must provide the necessary details

of your Alfresco data source so that Amazon Kendra can access your data. If you have not yet

conﬁgured Alfresco for Amazon Kendra, see Prerequisites.

Console

To connect Amazon Kendra to Alfresco

1. Sign in to the AWS Management Console and open the Amazon Kendra console.

2. From the left navigation pane, choose Indexes and then choose the index you want to use

from the list of indexes.

Note

You can choose to conﬁgure or edit your User access control settings under Index

settings.

3. On the Getting started page, choose Add data source.

4. On the Add data source page, choose Alfresco connector, and then choose Add

connector. If using version 2 (if applicable), choose Alfresco connector with the "V2.0" tag.

5. On the Specify data source details page, enter the following information:

Alfresco 590

Amazon Kendra Developer Guide

a. In Name and description, for Data source name—Enter a name for your data source.

You can include hyphens but not spaces.

b. (Optional) Description—Enter an optional description for your data source.

c. In Default language—Choose a language to ﬁlter your documents for the index.

Unless you specify otherwise, the language defaults to English. Language speciﬁed in

the document metadata overrides the selected language.

d. In Tags, for Add new tag—Include optional tags to search and ﬁlter your resources or

track your AWS costs.

e. Choose Next.

6. On the Deﬁne access and security page, enter the following information:

a. Alfresco type—Choose whether you use Alfresco On-Premises/server or Alfresco Cloud

(Platform as a Service).

b. Alfresco repository URL—Enter your Alfresco repository URL. For example, if you use

Alfresco Cloud (PaaS), the repository URL could be https://company.alfrescocloud.com.

Or, if you use Alfresco On-Premises, the repository URL could be https://company-

alfresco-instance.company-domain.suﬃx:port.

c. Alfresco user application. URL—Enter your Alfresco user interface URL. You can get

the repository URL from your Alfresco administrator. For example, the user interface

URL could be https://example.com.

d. SSL certiﬁcate location—Enter the path to the SSL certiﬁcate stored in an Amazon S3

bucket. You use this to connect to Alfresco On-Premises with a secure SSL connection.

e. Authorization—Turn on or oﬀ access control list (ACL) information for your

documents, if you have an ACL and want to use it for access control. The ACL speciﬁes

which documents that users and groups can access. The ACL information is used to

ﬁlter search results based on the user or their group access to documents. For more

information, see User context ﬁltering.

f. Authentication—Choose Basic authentication or OAuth 2.0 authentication. Then

choose an existing Secrets Manager secret or create a new secret to store your Alfresco

credentials. If you choose to create a new secret, an AWS Secrets Manager secret

window opens.

If you chose Basic authentication, enter a name for the secret, the Alfresco user name,

and password.

Alfresco 591

Amazon Kendra Developer Guide

If you chose OAuth 2.0 authentication, enter a name for the secret, client ID, client

secret, and token URL.

g. Virtual Private Cloud (VPC)—You can choose to use a VPC. If so, you must add

Subnets and VPC security groups.

h. Identity crawler—Specify whether to turn on Amazon Kendra’s identity crawler. The

identity crawler uses the access control list (ACL) information for your documents to

ﬁlter search results based on the user or their group access to documents. If you have

an ACL for your documents and choose to use your ACL, you can then also choose

to turn on Amazon Kendra’s identity crawler to conﬁgure user context ﬁltering of

search results. Otherwise, if identity crawler is turned oﬀ, all documents can be publicly

searched. If you want to use access control for your documents and identity crawler is

turned oﬀ, you can alternatively use the PutPrincipalMapping API to upload user and

group access information for user context ﬁltering.

i. IAM role—Choose an existing IAM role or create a new IAM role to access your

repository credentials and index content.

Note

IAM roles used for indexes cannot be used for data sources. If you are unsure if

an existing role is used for an index or FAQ, choose Create a new role to avoid

errors.

j. Choose Next.

7. On the Conﬁgure sync settings page, enter the following information:

a. Sync scope—Set limits for crawling certain content and ﬁlter content using regex

expression patterns.

b. i. Content—Choose whether to crawl content marked with 'Aspects' in Alfresco,

content within a speciﬁc Alfresco site, or content across all your Alfresco sites.

ii. (Optional)Additional conﬁguration—Set the following settings:

• Include comments—Choose to include comments in Alfresco Document library

and Blog.

• Regex patterns—Regular expression patterns to include or exclude certain ﬁles.

Alfresco 592

Amazon Kendra Developer Guide

c. Sync mode—Choose how you want to update your index when your data source

content changes. When you sync your data source with Amazon Kendra for the ﬁrst

time, all content is crawled and indexed by default. You must run a full sync of your

data if your initial sync failed, even if you don't choose full sync as your sync mode

option.

• Full sync: Freshly index all content, replacing existing content each time your data

source syncs with your index.

• New, modiﬁed, deleted sync: Index only new, modiﬁed, and deleted content each

time your data source syncs with your index. Amazon Kendra can use your data

source's mechanism for tracking content changes and index content that changed

since the last sync.

d. In Sync run schedule, for Frequency—Choose how often to sync your data source

content and update your index.

e. Choose Next.

8. On the Set ﬁeld mappings page, enter the following information:

a. Select from the Amazon Kendra generated default data source ﬁelds that you want to

map to your index.

b. To add custom data source ﬁelds, create an index ﬁeld name to map to and the ﬁeld

data type.

c. Choose Next.

9. On the Review and create page, check that the information you have entered is correct

and then select Add data source. You can also choose to edit your information from this

page. Your data source will appear on the Data sources page after the data source has been

added successfully.

API

To connect Amazon Kendra to Alfresco

You must specify a JSON of the data source schema using the TemplateConﬁguration API. You

must provide the following information:

Alfresco 593

Amazon Kendra Developer Guide

•

Data source—Specify the data source type as ALFRESCO when you use the

TemplateConﬁguration JSON schema. Also specify the data source as TEMPLATE when you

call the CreateDataSource API.

• Alfresco site ID—Specify the Alfresco site ID.

• Alfresco repository URL—Specify the Alfresco repository URL. You can get the repository

URL from your Alfresco administrator. For example, if you use Alfresco Cloud (PaaS),

the repository URL could be https://company.alfrescocloud.com. Or, if you use Alfresco

On-Premises, the repository URL could be https://company-alfresco-instance.company-

domain.suﬃx:port.

• Alfresco web application URL—Specify the Alfresco user interface URL. You can get the

repository URL from your Alfresco administrator. For example, the user interface URL could

be https://example.com.

• Authentication type—Specify which type of authentication you want to use, whether

OAuth2 or Basic.

•

Alfresco type—Specify which type of Alfresco you use, whether PAAS (Cloud/Platform as a

Service) or ON_PREM (On-Premises).

• Secret Amazon Resource Name (ARN)—If you want to use basic authentication, you provide

a secret that stores your authentication credentials of your user name and password. You

provide the Amazon Resource Name (ARN) of an AWS Secrets Manager secret. The secret is

stored in a JSON structure with the following keys:

{

"username": "user name",

"password": "password"

}

If you want to use OAuth 2.0 authentication, the secret is stored in a JSON structure with the

following keys:

{

"clientId": "client ID",

"clientSecret": "client secret",

"tokenUrl": "token URL"

}

•

IAM role—Specify RoleArn when you call CreateDataSource to provide an IAM role with

permissions to access your Secrets Manager secret and to call the required public APIs for the

Alfresco 594

Amazon Kendra Developer Guide

Alfresco connector and Amazon Kendra. For more information, see IAM roles for Alfresco data

sources.

You can also add the following optional features:

•

Virtual Private Cloud (VPC)—Specify VpcConfiguration when you call

CreateDataSource. For more information, see Conﬁguring Amazon Kendra to use an

Amazon VPC.

• Content type—The type of content that you want to crawl, whether content marked with

'Aspects' in Alfresco, content within a speciﬁc Alfresco site, or content across all your Alfresco

sites. You can also list speciﬁc 'Aspects' content.

• Inclusion and exclusion ﬁlters—Specify whether to include or exclude certain ﬁles.