...
I have a document with a nanosecond precision ISO timestamp in a string ("0000-00-00T00:00:00.000000000Z"). I do not control the schema or formatting of this document. When I try to use $dateFromString or $toDate on this field from an aggregation, I get the following error: Error parsing date string '2006-01-02T15:04:05.123456789Z'; 0: passing a time zone identifier as part of the string is not allowed '2' If I trim off the nanosecond digit (to 10 ns precision or 8 decimal places), it seems to handle it okay. Additionally, the ISODate() and Date() constructors are able to parse the string just fine.
david.storch commented on Thu, 8 Oct 2020 15:25:02 +0000: https://github.com/derickr/timelib/issues/93 has been fixed upstream in timelib by commit https://github.com/derickr/timelib/commit/12fff7ad830f79688e4ab66671dd2c0a143f6ba2. This means that the bug should be resolved when we next upgrade our vendorized copy of timelib to a version that incorporates the fix. david.storch commented on Mon, 5 Oct 2020 21:10:18 +0000: I've filed an issue upstream with timelib to track the issue of not permitting date strings with nanosecond precision: see https://github.com/derickr/timelib/issues/93. david.storch commented on Wed, 23 Sep 2020 22:36:52 +0000: This patch fixes the issue for input strings that have up to 9 digits of precision expressed in fractional seconds: diff --git a/src/third_party/timelib-2018.01/parse_date.c b/src/third_party/timelib-2018.01/parse_date.c index ed89dd4dbc..c25ec889af 100644 --- a/src/third_party/timelib-2018.01/parse_date.c +++ b/src/third_party/timelib-2018.01/parse_date.c @@ -13109,7 +13109,9 @@ yy807: s->time->i = timelib_get_nr((char **) &ptr, 2); s->time->s = timelib_get_nr((char **) &ptr, 2); if (*ptr == '.') { - s->time->us = timelib_get_frac_nr((char **) &ptr, 9); + // Allow 10 characters: the '.' plus 9 significant digits for input strings + // that have up to nanosecond precision. + s->time->us = timelib_get_frac_nr((char **) &ptr, 10); if (*ptr) { /* timezone is optional */ s->time->z = timelib_parse_zone((char **) &ptr, &s->time->dst, s->time, &tz_not_found, s->tzdb, tz_get_wrapper); if (tz_not_found) { There are a few more questions which we need to investigate: Why is the error reporting so confusing? Even with the fix above, if you specify more than 9 digits of precision, the system will return a confusing error. This is probably a separate issue that should be corrected. Is 9 digits the correct amount of precision to support? The user could always have even more precise times, but it seems like finer than nanosecond granularity is uncommon. We should try to upstream a fix. But if we do apply a fix locally, we should probably do it in parse_date.re, and then use re2c to regenerate parse_date.c. david.storch commented on Wed, 23 Sep 2020 22:15:23 +0000: This appears to be a bug in timelib, the third party library that we use for time calculations. The timelib code uses re2c to generate date parsing code, which makes this a bit tricky to debug. It looks like we have both parse_date.re and the corresponding generated file, parse_date.c checked into the third_party directory. We only run re2c when we upgrade the timelib code. The build process relies on the generated file, parse_date.c. I was able to patch the generated file so that the timelib code generates debug output during date string parsing: --- a/src/third_party/timelib-2018.01/parse_date.c +++ b/src/third_party/timelib-2018.01/parse_date.c @@ -117,13 +117,8 @@ typedef unsigned char uchar; } \ } -#ifdef DEBUG_PARSER #define DEBUG_OUTPUT(s) printf("%s\n", s); #define YYDEBUG(s,c) { if (s != -1) { printf("state: %d ", s); printf("[%c]\n", c); } } -#else -#define DEBUG_OUTPUT(s) -#define YYDEBUG(s,c) -#endif typedef struct _timelib_elems { unsigned int c; /* Number of elements */ In the problematic case, this resulted in the following debug output: state: 0 [2] state: 27 [2] state: 1027 [0] state: 1030 [0] state: 1031 [6] state: 752 [-] state: 937 [0] state: 960 [1] state: 961 [-] state: 962 [0] state: 965 [2] state: 966 [T] state: 967 [1] state: 969 [5] state: 970 [:] state: 971 [0] state: 972 [4] state: 973 [:] state: 974 [0] state: 976 [5] state: 977 [.] state: 978 [1] state: 979 [2] state: 978 [2] state: 979 [3] state: 978 [3] state: 979 [4] state: 978 [4] state: 979 [5] state: 978 [5] state: 979 [6] state: 978 [6] state: 979 [7] state: 978 [7] state: 979 [8] state: 978 [8] state: 979 [9] state: 978 [9] state: 979 [Z] state: 807 [Z] xmlrpc | xmlrpcnocolon | soap | wddx | exif state: 0 [Z] state: 46 [Z] state: 3 [] tzcorrection | tz state: 0 [] state: 51 [] state: 52 [] state: 0 [] I'm not sure how useful the output from the generated re2c code is, but the following lines are interesting: xmlrpc | xmlrpcnocolon | soap | wddx | exif tzcorrection | tz I believe this first line consists of various abbreviations for date string formats. The input date string matches the "soap" format, which is YYYY-MM-DDTHH:MM:SS.FFFFFFFFF with an optional timezone: soap = year4 "-" monthlz "-" daylz "T" hour24lz ":" minutelz ":" secondlz frac tzcorrection?; Since the input string has the "Z" for zulu time, we are falling through to the code for parsing the timezone, and it is here that something is going wrong. I'll have to dig a bit further to determine how to fix it. asya commented on Tue, 14 Jul 2020 19:08:33 +0000: Here is an interesting observation of where this fails and where it succeeds: /* using agg {$set:{d:{$convert:{input:"$dateString", onError:"cannot convert",to:"date"}} */ { "dateString" : "2006-01-02T15:04:05.1234", "d" : ISODate("2006-01-02T15:04:05.123Z") } { "dateString" : "2006-01-02T15:04:05.12345", "d" : ISODate("2006-01-02T15:04:05.123Z") } { "dateString" : "2006-01-02T15:04:05.123456", "d" : ISODate("2006-01-02T15:04:05.123Z") } { "dateString" : "2006-01-02T15:04:05.1234567", "d" : ISODate("2006-01-02T15:04:05.123Z") } { "dateString" : "2006-01-02T15:04:05.123456Z", "d" : ISODate("2006-01-02T15:04:05.123Z") } { "dateString" : "2006-01-02T15:04:05.12345678", "d" : ISODate("2006-01-02T15:04:05.123Z") } { "dateString" : "2006-01-02T15:04:05.1234567Z", "d" : ISODate("2006-01-02T15:04:05.123Z") } { "dateString" : "2006-01-02T15:04:05.123456781", "d" : "cannot convert" } { "dateString" : "2006-01-02T15:04:05.123456789", "d" : "cannot convert" } { "dateString" : "2006-01-02T15:04:05.12345678A", "d" : ISODate("2006-01-02T14:04:05.123Z") } { "dateString" : "2006-01-02T15:04:05.12345678D", "d" : ISODate("2006-01-02T11:04:05.123Z") } { "dateString" : "2006-01-02T15:04:05.12345678Z", "d" : ISODate("2006-01-02T15:04:05.123Z") } { "dateString" : "2006-01-02T15:04:05.123456789Z", "d" : "cannot convert" } { "dateString" : "2006-01-02T15:04:05.12345678DZ", "d" : "cannot convert" } { "dateString" : "2006-01-02T15:04:05.12345678DZB", "d" : "cannot convert" } { "dateString" : "2006-01-02T15:04:05.12345678DZBF", "d" : "cannot convert" } Note that up to eight characters after the decimal converts just fine (even though we don't technically support nanoseconds or even microseconds, so they properly just drop the digits past first three). The 9th character when it's a number causes failure. If it's a letter (any letter) then it's ignored. If there are only eight characters after the dot, it works whether that character is a number or a letter. So it appears that it always looks to strip off the last letter (if there is more than one it fails) and then as long as you're left with at most eight digits, it succeeds.
> use test switched to db test > version() 4.2.8 > db.test.find({}) {{ { "_id" : ObjectId("5efbaba4250838017f12c045"), "dateString" : "2006-01-02T15:04:05.123456789Z" } }} > db.test.aggregate([\{$project: {date: {$dateFromString: {dateString: '$dateString'}}}}]) 2020-06-30T14:22:04.013-0700 E QUERY [js] uncaught exception: Error: command failed: { "ok" : 0, "errmsg" : "Error parsing date string '2006-01-02T15:04:05.123456789Z'; 0: passing a time zone identifier as part of the string is not allowed '2'", "code" : 241, "codeName" : "ConversionFailure" } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:583:17 assert.commandWorked@src/mongo/shell/assert.js:673:16 DB.prototype._runAggregate@src/mongo/shell/db.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1012:12 @(shell):1:1 > db.test.aggregate([{$project: {date: {$dateFromString: {dateString: {$substrBytes: ['$dateString', 0, 28]}}}])}} {{ { "_id" : ObjectId("5efbaba4250838017f12c045"), "date" : ISODate("2006-01-02T15:04:05.123Z") } }} > db.test.aggregate([{$project: {date: {$dateFromString: {dateString: {$substrBytes: ['$dateString', 0, 29]}}}])}} 2020-06-30T14:21:22.393-0700 E QUERY [js] uncaught exception: Error: command failed: { "ok" : 0, "errmsg" : "Error parsing date string '2006-01-02T15:04:05.123456789'; 0: passing a time zone identifier as part of the string is not allowed '2'", "code" : 241, "codeName" : "ConversionFailure" } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:583:17 assert.commandWorked@src/mongo/shell/assert.js:673:16 DB.prototype._runAggregate@src/mongo/shell/db.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1012:12 @(shell):1:1 > ISODate('2006-01-02T15:04:05.123456789Z') ISODate("2006-01-02T15:04:05.123Z") > Date('2006-01-02T15:04:05.123456789Z') Tue Jun 30 2020 14:23:41 GMT-0700 (PDT) > db.test.updateOne({_id: ObjectId('5efbaba4250838017f12c045')}, {$set: {'dateString': '2006-01-02T15:04:05.12345678Z')}} {{ { "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 } }} > db.test.aggregate([\{$project: {date: {$dateFromString: {dateString: '$dateString'}}}}]) {{ { "_id" : ObjectId("5efbaba4250838017f12c045"), "date" : ISODate("2006-01-02T15:04:05.123Z") } }} > db.test.updateOne({_id: ObjectId('5efbaba4250838017f12c045')}, {$set: {'dateString': '2006-01-02T15:04:05.123456789Z')}} {{ { "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 } }} > db.test.aggregate([\{$project: {date: {$dateFromString: {dateString: '$dateString'}}}}]) 2020-06-30T14:37:05.058-0700 E QUERY [js] uncaught exception: Error: command failed: { {{ "ok" : 0,}} {{ "errmsg" : "Error parsing date string '2006-01-02T15:04:05.123456789Z'; 0: passing a time zone identifier as part of the string is not allowed '2'",}} {{ "code" : 241,}} {{ "codeName" : "ConversionFailure"}} } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:583:17 assert.commandWorked@src/mongo/shell/assert.js:673:16 DB.prototype._runAggregate@src/mongo/shell/db.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1012:12 @(shell):1:1 >