CreateCorpus
POST/v1/create-corpus
Creates a corpus, which is a container to store data in.
Some tips for this API:
- This operation works with the Personal API Key and OAuth 2.0 (in a JWT "Bearer Token") authentication. You can find details of how to set up and use OAuth 2.0 here.
- The
name
of the corpus is the only required field. - Filter attributes tell Vectara which metadata fields you'd like to run SQL-style filters against. If you need to change them after you've created the corpus, see the Replace Filter Attributes API
textless
andcustomDimensions
are features that are only available to Scale accounts.
Request
Header Parameters
Enter the Customer ID to use for the request.
Default value: 30S
(Optional) Enter the timeout value of the request in seconds, such as 10S or 30S.
- application/json
Body
required
- Array [
- ]
- Array [
- ]
corpus objectrequired
The Corpus ID. This value is ignored during Corpus creation.
The name of the corpus.
A description for the corpus.
The time at which the corpus was provisioned. This value is ignored during Corpus creation.
Whether the corpus is enabled for use or not. This value is ignored during Corpus creation.
The default query encoder is designed for normal question-answering types of queries when the text contains the answer. Swapping the index encoder is generally rare, but can be used to help directly match questions to questions. This can be useful if you have a FAQ dataset and you want to directly match the user question to the question in the FAQ.
When a corpus is "textless", Vectara does not store the original text. Instead, Vectara converts the text to vectors and only retains metadata.
Encryption is on by default and cannot be turned off.
This is an advanced setting for changing the underlying model type. The default value is "1", which is Vectara's high-performing global model. Underlying models may be swapped for some paying customers by contacting our support team.
An optional maximum size of the metadata that each document can contain.
customDimensions object[]
The name of the custom dimension. The maximum length of the name is 8 characters.
A description for the custom dimension.
The default weight to give this dimension when running queries. A value of 0.0, for example, gives it no weight at all.
The default value to give to documents for this custom dimension.
filterAttributes object[]
Name of the field, as seen in metadata.
An optional description.
Whether the field is indexed for maximum query speed.
Possible values: [FILTER_ATTRIBUTE_TYPE__UNDEFINED
, FILTER_ATTRIBUTE_TYPE__INTEGER
, FILTER_ATTRIBUTE_TYPE__INTEGER_LIST
, FILTER_ATTRIBUTE_TYPE__REAL
, FILTER_ATTRIBUTE_TYPE__REAL_LIST
, FILTER_ATTRIBUTE_TYPE__TEXT
, FILTER_ATTRIBUTE_TYPE__TEXT_LIST
, FILTER_ATTRIBUTE_TYPE__BOOLEAN
]
Default value: FILTER_ATTRIBUTE_TYPE__UNDEFINED
Possible values: [FILTER_ATTRIBUTE_LEVEL__UNDEFINED
, FILTER_ATTRIBUTE_LEVEL__DOCUMENT
, FILTER_ATTRIBUTE_LEVEL__DOCUMENT_PART
]
Default value: FILTER_ATTRIBUTE_LEVEL__UNDEFINED
Responses
- 200
- default
A successful response.
- application/json
- Schema
- Example (from schema)
Schema
The Corpus ID that was created.
status object
Possible values: [OK
, FAILURE
, UNKNOWN
, INVALID_ARGUMENT
, DEADLINE_EXCEEDED
, ALREADY_EXISTS
, PERMISSION_DENIED
, RESOURCE_EXHAUSTED
, FAILED_PRECONDITION
, ABORTED
, OUT_OF_RANGE
, UNIMPLEMENTED
, INTERNAL
, UNAVAILABLE
, DATA_LOSS
, UNAUTHENTICATED
, BAD_REQUEST
, UNAUTHORIZED
, FORBIDDEN
, NOT_FOUND
, METHOD_NOT_ALLOWED
, CONFLICT
, UNSUPPORTED_MEDIA_TYPE
, TOO_MANY_REQUESTS
, INTERNAL_SERVER_ERROR
, NOT_IMPLEMENTED
, SERVICE_UNAVAILABLE
, INSUFFICIENT_STORAGE
, UNPARSEABLE_RESPONSE
, DISABLED_CUSTOMER
, INVALID_CUSTOMER_ID
, DISABLED_CORPUS
, INVALID_CORPUS_ID
, DISABLED_API_KEY
, EXPIRED_API_KEY
, INVALID_API_KEY
, CMK_INACCESSIBLE
, QRY__DISABLED_CORPUS
, QRY__DOCUMENT_DB_FAILURE
, QRY__ENCODER_FAILURE
, QRY__INTERRUPTED
, QRY__INVALID_CORPUS
, QRY__INVALID_START
, QRY__INVALID_NUM_RESULTS
, QRY__INVALID_CONTEXT
, QRY__MISSING_QUERY
, QRY__MISSING_CORPUS
, QRY__TIMEOUT
, QRY__TOO_MANY_CORPORA
, QRY__TOO_MANY_QUERIES
, QRY__VECTOR_INDEX_FAILURE
, QRY__INVALID_DIMENSION
, QRY__INVALID_CLIENTKEY
, QRY__DECRYPTION_FAILURE
, QRY__INVALID_RERANKER
, QRY__PARTIAL_RERANK
, QRY__RERANK_FAILURE
, QRY__TOO_MANY_RESULT_ROWS
, QRY__PARTIAL_RETRIEVAL
, QRY__SMRY__INVALID_SUMMARIZER_PROMPT
, QRY__SMRY__INVALID_SUMMARY_LANG
, QRY__SMRY__UNSUPPORTED_SUMMARY_LANG
, QRY__SMRY__PARTIAL_SUMMARY
, QRY__SMRY__NO_QUERY_RESULTS
, QRY__SMRY__EVAL_UNSUPPORTED_LANG
, QRY__SMRY__EVAL_FAILURE
, QRY__GEN__NO_QUERY_RESULTS
, QRY__GEN__UNPARSEABLE_MODEL_PARAMS
, CX_SPECS__INVALID_JSON
, CX_SPECS__UNREGISTERED_TYPE
, CX_SPECS__MISSING_SPEC
, CX_SPECS__MISSING_TYPE
, CX_SPECS__UNPARSEABLE_SPEC
, ADM__INVALID_CUSTOMER_ID
, ADM__INVALID_CORPUS_ID
, ADM__INVALID_ENCODER_ID
, ADM__INVALID_ROLE_ID
, ADM__ROLE_ALREADY_EXISTS
, ADM__ONLY_ONE_OWNER_SUPPORTED
, ADM__INVALID_PERMISSION
, ADM__ROLECREATION_FAILURE
, ADM__USER_EMAIL_NOT_AVAIALBLE
, ADM__USERNAME_NOT_AVAILABLE
, ADM__SIGNUP_MISSING_NAME
, ADM__SIGNUP_MISSING_ORG
, ADM__SIGNUP_MISSING_EMAIL
, ADM__SIGNUP_MISSING_PAYMENT
, ADM__SIGNUP_MISSING_PLAN
, ADM__SIGNUP_MISSING_PASSWORD
, ADM__SIGNUP_INVALID_NAME
, ADM__SIGNUP_INVALID_ORG
, ADM__SIGNUP_INVALID_EMAIL
, ADM__SIGNUP_INVALID_PAYMENT
, ADM__SIGNUP_INVALID_PLAN
, ADM__SIGNUP_INVALID_PASSWORD
, ADM__SIGNUP_INVALID_ACCOUNT_ALIAS
, ADM__SIGNUP_INVALID_EMAIL_VALIDATION_CODE
, ADM__SIGNUP_MISSING_COUNTRY_CODE
, ADM__SIGNUP_ROOT_EMAIL_NOT_AVAILABLE
, ADM__CUST_MARK_DELETE_FAILED
, ADM__CUST_FAISS_DEALLOC_FAILED
, ADM__CUST_ALREADY_ACTIVE
, ADM__CUST_REACTIVATE_FAILED
, ADM__CUST_ENABLEMENT_FAILED
, ADM__CORPUS_LIMIT_REACHED
, ADM__STRIPE_CARD_DECLINED
, ADM__STRIPE_PROCESSING_ERROR
, ADM__EMAIL_VALIDATION_REQUEST_NOT_FOUND
, ADM__EMAIL_NOT_VALIDATED
, ADM__CHANGE_PLAN__NO_CURRENT_PLAN
, ADM__CHANGE_PLAN__REQUIRES_MANUAL_CHANGE
, ADM__CHANGE_PLAN__INVALID_PLAN_ID
, ADM__CHANGE_PLAN__NO_PAYMENT_SOURCE
, ADM__CHANGE_PLAN__INVALID_EFFECTIVE_DATE
, ADM__CHANGE_PLAN__CONFLICTING_CHANGE
, SCM__MISCONFIGURED_CONNECTION
, STATS_DB_READ_FAILURE
, VDB__TEXT_READ_FAILURE
, REBUILD__LOW_RECALL
, REBUILD__INDEX_UPLOAD_FAILURE
, REBUILD__UPDATE_JOURNAL_FAILURE
, REBUILD__UPDATE_FAISSPARAMS_FAILURE
, REBUILD__NO_DATA
, REBUILD__EVALUATION
, IDX__TRANSIENT_PARTIAL_DELETION_FAILURE
, IDX__PERMANENT_PARTIAL_DELETION_FAILURE
, CALB__INVALID_JSON
, CALB__INVALID_SPEC
, CALB__UNREGISTERED_TYPE
, CALB__MISSING_SPEC
, CALB__MISSING_TYPE
, CALB__UNPARSABLE_SPEC
]
Default value: OK
{
"corpusId": 0,
"status": {
"code": "OK",
"statusDetail": "string"
}
}
An unexpected error response.
- application/json
- Schema
- Example (from schema)
Schema
- Array [
-
If no scheme is provided,
https
is assumed. -
An HTTP GET on the URL must yield a [google.protobuf.Type][]
value in binary format, or produce an error.
-
Applications are allowed to cache lookup results based on the
URL, or have them precompiled into a binary to avoid any lookup. Therefore, binary compatibility needs to be preserved on changes to types. (Use versioned type names to manage breaking changes.)
- ]
details object[]
A URL/resource name that uniquely identifies the type of the serialized
protocol buffer message. This string must contain at least
one "/" character. The last segment of the URL's path must represent
the fully qualified name of the type (as in
path/google.protobuf.Duration
). The name should be in a canonical form
(e.g., leading "." is not accepted).
In practice, teams usually precompile into the binary all types that they
expect it to use in the context of Any. However, for URLs which use the
scheme http
, https
, or no scheme, one can optionally set up a type
server that maps type URLs to message definitions as follows:
Note: this functionality is not currently available in the official protobuf release, and it is not used for type URLs beginning with type.googleapis.com.
Schemes other than http
, https
(or the empty scheme) might be
used with implementation specific semantics.
{
"code": 0,
"message": "string",
"details": [
{
"@type": "string"
}
]
}