MDEV-37262 XMLISVALID() schema validation function#5171
Conversation
XMLTYPE column added. Type_handler::get_column_attributes() added so parser can check if unexpected attributes were specified for the UDT column.
XMLVALID function added to the XMLTYPE plugin.
|
Alexey Botchkov seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new 'xmltype' data type plugin and an 'XMLISVALID' function, alongside a validation mechanism to prevent unsupported attributes (like length, decimals, character set, or SRID) on various data types (e.g., 'vector', 'INET4', 'INET6', 'UUID'). While the changes are valuable, several critical issues must be addressed: the source file 'item_func_xml_isvalid.cc' is missing from the PR, causing compilation failures; the 'typeid' comparison in 'Item_char_typecast::eq' incorrectly compares pointer types instead of dereferenced objects; a potential null pointer dereference exists on 'last_field' in 'LEX::set_field_type_udt'; and there is a potential data race on the static 'xml_handler' initialization. Additionally, checking the string length before parsing in 'Item_xmltype_typecast::val_str' would improve consistency and robustness.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA | ||
|
|
||
| MYSQL_ADD_PLUGIN(type_xmltype | ||
| plugin.cc sql_type_xmltype.cc item_func_xml_isvalid.cc |
There was a problem hiding this comment.
| return 1; | ||
| if (item->type() != FUNC_ITEM || | ||
| functype() != ((Item_func*)item)->functype()) | ||
| if (typeid(this) != typeid(item)) |
There was a problem hiding this comment.
Using typeid(this) != typeid(item) compares the pointer types (Item_char_typecast* vs const Item*) instead of the dynamic types of the objects themselves. Since these pointer types are statically different, this check will always evaluate to true (and return 0), breaking the correctness of the eq method. To compare the dynamic types of the polymorphic objects, you must dereference the pointers.
if (typeid(*this) != typeid(*item))| if (!res) | ||
| return NULL; | ||
|
|
||
| if (check_parse_xml(res->ptr(), res->length(), res->charset()) != MY_XML_OK) |
There was a problem hiding this comment.
For consistency with Field_xmltype::store and to prevent potential null-pointer dereferences or unnecessary parsing of empty/too-short strings, check if res->length() < 4 before calling check_parse_xml.
if (res->length() < 4 ||
check_parse_xml(res->ptr(), res->length(), res->charset()) != MY_XML_OK)| column_attributes|= last_field->get_attr_uint32(0) ? | ||
| Type_handler::ATTR_SRID : 0; |
There was a problem hiding this comment.
Defensive programming: last_field can potentially be NULL in certain parser contexts or error states. It should be checked for NULL before calling get_attr_uint32(0) to avoid a potential null pointer dereference crash.
column_attributes|= (last_field && last_field->get_attr_uint32(0)) ?
Type_handler::ATTR_SRID : 0;| if (!xml_handler) | ||
| xml_handler= Type_handler::handler_by_name(thd, name); |
There was a problem hiding this comment.
There is a potential data race on the static member Item_func_xml_update::xml_handler when multiple threads concurrently execute fix_length_and_dec and initialize it. While the race might be benign (writing the same pointer), it is technically undefined behavior in C++ and can be flagged by TSAN. Consider making xml_handler a std::atomic<const Type_handler*> or using a thread-safe initialization pattern.
There was a problem hiding this comment.
Pull request overview
This PR introduces an XMLTYPE data type plugin and an XMLISVALID() function for XML Schema validation, while extending the SQL parser/type system to better control which column/type attributes are allowed for plugin-defined (UDT) types.
Changes:
- Add a new
type_xmltypeplugin implementingXMLTYPE, including casting behavior and replication/type aggregation support. - Add
XMLISVALID(xml, schema[, element])as a native function with extensive mysql-test coverage for validation behavior. - Introduce attribute-allowlisting for UDTs via
Type_handler::get_column_attributes()and enforce it in the parser, with a newER_UNSUPPORTED_DATA_TYPE_ATTRIBUTEerror.
Reviewed changes
Copilot reviewed 36 out of 37 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| strings/xml.c | Propagate XML value-handler failures for text nodes during parsing. |
| sql/table.cc | Validate assignability of VCOL default expressions when unpacking from FRM. |
| sql/sql_yacc.yy | Extend UDT grammar to accept opt_binary and adjust parser expectations. |
| sql/sql_type.h | Add column_attributes flags and default get_column_attributes() API. |
| sql/sql_type_vector.h | Restrict VECTOR UDT attributes to length-only. |
| sql/sql_type_vector.cc | Tighten VECTOR attribute handling and enforce binary charset internally. |
| sql/sql_type_geom.h | Restrict GEOMETRY attributes (length/dec/srid). |
| sql/sql_type_fixedbin.h | Restrict fixed binary type bundle attributes (none). |
| sql/sql_lex.h | Extend LEX APIs to pass charset/collation attributes into UDT type setup. |
| sql/sql_lex.cc | Enforce UDT attribute allowlist and add CAST UDT charset validation. |
| sql/share/errmsg-utf8.txt | Add ER_UNSUPPORTED_DATA_TYPE_ATTRIBUTE message. |
| sql/item_xmlfunc.h | Adjust UpdateXML item typing to support XMLTYPE result typing. |
| sql/item_xmlfunc.cc | Implement UpdateXML result type handler lookup/caching. |
| sql/item_timefunc.h | Expose Item_char_typecast::print_charset() for reuse by XMLTYPE cast printing. |
| sql/item_timefunc.cc | Refactor charset-printing out of Item_char_typecast::print(). |
| plugin/type_xmltype/CMakeLists.txt | Build definition for new XMLTYPE plugin sources. |
| plugin/type_xmltype/plugin.cc | Register xmltype data type plugin and XMLISVALID function plugin. |
| plugin/type_xmltype/sql_type_xmltype.h | Declare Type_handler_xmltype, Field_xmltype, and XMLISVALID item types. |
| plugin/type_xmltype/sql_type_xmltype.cc | Implement XMLTYPE handler/field/typecast logic and XML well-formedness checks. |
| plugin/type_xmltype/item_func_xml_isvalid.cc | Implement XML Schema parsing and XML validation execution for XMLISVALID(). |
| plugin/type_xmltype/mysql-test/type_xmltype/type_xmltype.test | Add test coverage for XMLTYPE DDL/DML, casting, and basic validation. |
| plugin/type_xmltype/mysql-test/type_xmltype/type_xmltype.result | Expected results for XMLTYPE test suite. |
| plugin/type_xmltype/mysql-test/type_xmltype/type_xmltype_validation.test | Extensive XMLISVALID schema validation test cases. |
| plugin/type_xmltype/mysql-test/type_xmltype/type_xmltype_validation.result | Expected results for validation suite. |
| plugin/type_xmltype/mysql-test/type_xmltype/suite.pm | Register the new mysql-test suite for XMLTYPE as default. |
| plugin/type_uuid/mysql-test/type_uuid/type_uuid.test | Add attribute-rejection tests using new error code for UUID UDT. |
| plugin/type_uuid/mysql-test/type_uuid/type_uuid.result | Expected results for UUID attribute-rejection tests. |
| plugin/type_test/plugin.cc | Declare allowed attributes for test UDT handlers. |
| plugin/type_test/mysql-test/type_test/type_test_int8.test | Add attribute-rejection tests for TEST_INT8 UDT. |
| plugin/type_test/mysql-test/type_test/type_test_int8.result | Expected results for TEST_INT8 attribute-rejection tests. |
| plugin/type_inet/mysql-test/type_inet/type_inet6.test | Add attribute-rejection tests for INET6 and adjust a prior test to new behavior. |
| plugin/type_inet/mysql-test/type_inet/type_inet6.result | Expected results for INET6 attribute-rejection tests. |
| plugin/type_inet/mysql-test/type_inet/type_inet4.test | Add attribute-rejection tests for INET4. |
| plugin/type_inet/mysql-test/type_inet/type_inet4.result | Expected results for INET4 attribute-rejection tests. |
| mysql-test/main/vector2.test | Update VECTOR attribute error expectation to new error code. |
| mysql-test/main/vector2.result | Expected results reflecting new VECTOR attribute error message. |
| include/my_xml.h | Add a new parser flag constant intended for “assert well formed” behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| \r, \n, \t characters. | ||
| */ | ||
| #define MY_XML_FLAG_SKIP_TEXT_NORMALIZATION 2 | ||
| #define MY_XML_FLAG_ASSERT_WELL_FORMED 4 | ||
|
|
| const Type_handler *Item_func_xml_update::xml_handler= NULL; | ||
|
|
||
|
|
||
| bool Item_func_xml_update::fix_length_and_dec(THD *thd) | ||
| { | ||
| static LEX_CSTRING name= {STRING_WITH_LEN("XMLTYPE")}; | ||
|
|
||
| if (!xml_handler) | ||
| xml_handler= Type_handler::handler_by_name(thd, name); | ||
|
|
||
| return Item_xml_str_func::fix_length_and_dec(thd); | ||
| } | ||
|
|
||
|
|
||
| const Type_handler *Item_func_xml_update::type_handler() const | ||
| { | ||
| return xml_handler ? xml_handler : Item_xml_str_func::type_handler(); | ||
| } |
| if (!args[1]->const_item()) | ||
| { | ||
| my_printf_error(ER_UNKNOWN_ERROR, | ||
| "Only constant XML Schema-s are supported.", MYF(0)); | ||
| return TRUE; |
| if (!(p->flags & MY_XML_FLAG_SKIP_TEXT_NORMALIZATION)) | ||
| my_xml_norm_text(&a); | ||
| if (a.beg != a.end) | ||
| { | ||
| my_xml_value(p,a.beg,(size_t) (a.end-a.beg)); | ||
| if (MY_XML_OK != my_xml_value(p,a.beg,(size_t) (a.end-a.beg))) | ||
| return MY_XML_ERROR; | ||
| } |
No description provided.