ES使用JSON作为数据交互格式,所以简单来说,JSON支持的数据类型,ES都支持。
1. core simple field types
- String: string
- Whole number: byte, short, integer, long
- Floating point: float, double
- Boolean: boolean
- Date: date
ES对于没有定义mapping的未知字段会采用dynamic mapping进行类型猜测,并且将其加入到type mapping中。这个行为可以通过 dynamic 配置项进行控制。
- true: Add new fields dynamically — the default
- false: Ignore new fields
- strict: Throw an exception if an unknown field is encountered
相应的mappings配置如下:
PUT /my_index
{
"mappings": {
"my_type": {
"dynamic": "strict",
"properties": {
"title": { "type": "string"},
"stash": {
"type": "object",
"dynamic": true
}
}
}
}
}
TIPS 日期格式
JSON并没有日期类型,日期是以特定字符串格式形式表示。比如postDate: "2014-09-24" 。ES能够根据字符串的格式猜测出这是一个Date类型。日期字符串格式可以通过format和dynamic_date_formats配置项指定(date-format),而且支持多个格式如:yyyy/MM/dd HH:mm:ss |
yyyy/MM/dd。 |
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"path" : "post_date",
"format" : "YYYY-MM-dd",
"default" : "1970-01-01"
}
}
}
可以通过date_detection配置项单独关闭日期类型检查:
curl -XPUT "http://localhost:9200/myindex" -d
{
"mappings": {
"tweet": {
"date_detection": false
}
}
}
2. complex core field types
除了简单的基本类型,ES还支持如下复杂数据类型:
multi-value fields
对应JSON中的数组。如tag
{ "tag": [ "search", "nosql" ]}
但是事实上ES并没有array类型,因为默认就是支持的。具体参见array type。
empty fields
- empty string: “”
- null value: null
- empty array: []
- array_with_null_value: [null]
multi-level objects
对应JSON中的嵌套对象,例如:
{
"tweet": "Elasticsearch is very flexible",
"user": {
"id": "@johnsmith",
"gender": "male",
"age": 26,
"name": {
"full": "John Smith",
"first": "John",
"last": "Smith"
}
}
}
ES会自动检测和映射为object类型,也可以通过如下mapping配置:
{
"gb": {
"tweet": {
"properties": {
"tweet": { "type": "string" },
"user": {
"type": "object",
"properties": {
"id": { "type": "string" },
"gender": { "type": "string" },
"age": { "type": "long" },
"name": {
"type": "object",
"properties": {
"full": { "type": "string" },
"first": { "type": "string" },
"last": { "type": "string" }
}
}
}
}
}
}
}
}
NOTES && TIPS how inner objects are indexed
Lucene并没有嵌套对象的概念,事实上,A Lucene document consists of a flat list of key-value pairs. Lucene only indexes scalar or simple values, not complex datastructures. ES其实是做了如下平坦化处理:
{
"tweet": [elasticsearch, flexible, very],
"user.id": [@johnsmith],
"user.gender": [male],
"user.age": [26],
"user.name.full": [john, smith],
"user.name.first": [john],
"user.name.last": [smith]
}
arrays of inner objects
Finally, consider how an array containing inner objects would be indexed. Let’s say we have a followers array which looks like this:
{
"followers": [
{ "age": 35, "name": "Mary White"},
{ "age": 26, "name": "Alex Jones"},
{ "age": 19, "name": "Lisa Smith"}
]
}
This document will be flattened as we described above, but the result will look like this:
{
"followers.age": [19, 26, 35],
"followers.name": [alex, jones, lisa, smith, mary, white]
}
这其实是有问题的,ES提供了一个称之为nested objects的解决方案。比较恶心,这里不讨论。