Skip to content

MDEV-37262 XMLISVALID() schema validation function#5171

Open
abarkov wants to merge 5 commits into
mainfrom
bb-12.3-mdev-37262-hf
Open

MDEV-37262 XMLISVALID() schema validation function#5171
abarkov wants to merge 5 commits into
mainfrom
bb-12.3-mdev-37262-hf

Conversation

@abarkov
Copy link
Copy Markdown
Contributor

@abarkov abarkov commented Jun 3, 2026

No description provided.

Alexey Botchkov added 5 commits February 22, 2026 22:52
XMLTYPE column added.
Type_handler::get_column_attributes() added so parser can check
if unexpected attributes were specified for the UDT column.
XMLVALID function added to the XMLTYPE plugin.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Alexey Botchkov seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new 'xmltype' data type plugin and an 'XMLISVALID' function, alongside a validation mechanism to prevent unsupported attributes (like length, decimals, character set, or SRID) on various data types (e.g., 'vector', 'INET4', 'INET6', 'UUID'). While the changes are valuable, several critical issues must be addressed: the source file 'item_func_xml_isvalid.cc' is missing from the PR, causing compilation failures; the 'typeid' comparison in 'Item_char_typecast::eq' incorrectly compares pointer types instead of dereferenced objects; a potential null pointer dereference exists on 'last_field' in 'LEX::set_field_type_udt'; and there is a potential data race on the static 'xml_handler' initialization. Additionally, checking the string length before parsing in 'Item_xmltype_typecast::val_str' would improve consistency and robustness.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1335 USA

MYSQL_ADD_PLUGIN(type_xmltype
plugin.cc sql_type_xmltype.cc item_func_xml_isvalid.cc
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The file item_func_xml_isvalid.cc is listed as a source file for the type_xmltype plugin, but it is missing from the pull request. This will cause compilation and linking to fail because the implementation of Item_func_xml_isvalid methods (such as val_bool and fix_length_and_dec) is not provided.

Comment thread sql/item_timefunc.cc
return 1;
if (item->type() != FUNC_ITEM ||
functype() != ((Item_func*)item)->functype())
if (typeid(this) != typeid(item))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using typeid(this) != typeid(item) compares the pointer types (Item_char_typecast* vs const Item*) instead of the dynamic types of the objects themselves. Since these pointer types are statically different, this check will always evaluate to true (and return 0), breaking the correctness of the eq method. To compare the dynamic types of the polymorphic objects, you must dereference the pointers.

  if (typeid(*this) != typeid(*item))

if (!res)
return NULL;

if (check_parse_xml(res->ptr(), res->length(), res->charset()) != MY_XML_OK)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with Field_xmltype::store and to prevent potential null-pointer dereferences or unnecessary parsing of empty/too-short strings, check if res->length() < 4 before calling check_parse_xml.

  if (res->length() < 4 ||
      check_parse_xml(res->ptr(), res->length(), res->charset()) != MY_XML_OK)

Comment thread sql/sql_lex.cc
Comment on lines +13100 to +13101
column_attributes|= last_field->get_attr_uint32(0) ?
Type_handler::ATTR_SRID : 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Defensive programming: last_field can potentially be NULL in certain parser contexts or error states. It should be checked for NULL before calling get_attr_uint32(0) to avoid a potential null pointer dereference crash.

  column_attributes|= (last_field && last_field->get_attr_uint32(0)) ?
                        Type_handler::ATTR_SRID : 0;

Comment thread sql/item_xmlfunc.cc
Comment on lines +3082 to +3083
if (!xml_handler)
xml_handler= Type_handler::handler_by_name(thd, name);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a potential data race on the static member Item_func_xml_update::xml_handler when multiple threads concurrently execute fix_length_and_dec and initialize it. While the race might be benign (writing the same pointer), it is technically undefined behavior in C++ and can be flagged by TSAN. Consider making xml_handler a std::atomic<const Type_handler*> or using a thread-safe initialization pattern.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an XMLTYPE data type plugin and an XMLISVALID() function for XML Schema validation, while extending the SQL parser/type system to better control which column/type attributes are allowed for plugin-defined (UDT) types.

Changes:

  • Add a new type_xmltype plugin implementing XMLTYPE, including casting behavior and replication/type aggregation support.
  • Add XMLISVALID(xml, schema[, element]) as a native function with extensive mysql-test coverage for validation behavior.
  • Introduce attribute-allowlisting for UDTs via Type_handler::get_column_attributes() and enforce it in the parser, with a new ER_UNSUPPORTED_DATA_TYPE_ATTRIBUTE error.

Reviewed changes

Copilot reviewed 36 out of 37 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
strings/xml.c Propagate XML value-handler failures for text nodes during parsing.
sql/table.cc Validate assignability of VCOL default expressions when unpacking from FRM.
sql/sql_yacc.yy Extend UDT grammar to accept opt_binary and adjust parser expectations.
sql/sql_type.h Add column_attributes flags and default get_column_attributes() API.
sql/sql_type_vector.h Restrict VECTOR UDT attributes to length-only.
sql/sql_type_vector.cc Tighten VECTOR attribute handling and enforce binary charset internally.
sql/sql_type_geom.h Restrict GEOMETRY attributes (length/dec/srid).
sql/sql_type_fixedbin.h Restrict fixed binary type bundle attributes (none).
sql/sql_lex.h Extend LEX APIs to pass charset/collation attributes into UDT type setup.
sql/sql_lex.cc Enforce UDT attribute allowlist and add CAST UDT charset validation.
sql/share/errmsg-utf8.txt Add ER_UNSUPPORTED_DATA_TYPE_ATTRIBUTE message.
sql/item_xmlfunc.h Adjust UpdateXML item typing to support XMLTYPE result typing.
sql/item_xmlfunc.cc Implement UpdateXML result type handler lookup/caching.
sql/item_timefunc.h Expose Item_char_typecast::print_charset() for reuse by XMLTYPE cast printing.
sql/item_timefunc.cc Refactor charset-printing out of Item_char_typecast::print().
plugin/type_xmltype/CMakeLists.txt Build definition for new XMLTYPE plugin sources.
plugin/type_xmltype/plugin.cc Register xmltype data type plugin and XMLISVALID function plugin.
plugin/type_xmltype/sql_type_xmltype.h Declare Type_handler_xmltype, Field_xmltype, and XMLISVALID item types.
plugin/type_xmltype/sql_type_xmltype.cc Implement XMLTYPE handler/field/typecast logic and XML well-formedness checks.
plugin/type_xmltype/item_func_xml_isvalid.cc Implement XML Schema parsing and XML validation execution for XMLISVALID().
plugin/type_xmltype/mysql-test/type_xmltype/type_xmltype.test Add test coverage for XMLTYPE DDL/DML, casting, and basic validation.
plugin/type_xmltype/mysql-test/type_xmltype/type_xmltype.result Expected results for XMLTYPE test suite.
plugin/type_xmltype/mysql-test/type_xmltype/type_xmltype_validation.test Extensive XMLISVALID schema validation test cases.
plugin/type_xmltype/mysql-test/type_xmltype/type_xmltype_validation.result Expected results for validation suite.
plugin/type_xmltype/mysql-test/type_xmltype/suite.pm Register the new mysql-test suite for XMLTYPE as default.
plugin/type_uuid/mysql-test/type_uuid/type_uuid.test Add attribute-rejection tests using new error code for UUID UDT.
plugin/type_uuid/mysql-test/type_uuid/type_uuid.result Expected results for UUID attribute-rejection tests.
plugin/type_test/plugin.cc Declare allowed attributes for test UDT handlers.
plugin/type_test/mysql-test/type_test/type_test_int8.test Add attribute-rejection tests for TEST_INT8 UDT.
plugin/type_test/mysql-test/type_test/type_test_int8.result Expected results for TEST_INT8 attribute-rejection tests.
plugin/type_inet/mysql-test/type_inet/type_inet6.test Add attribute-rejection tests for INET6 and adjust a prior test to new behavior.
plugin/type_inet/mysql-test/type_inet/type_inet6.result Expected results for INET6 attribute-rejection tests.
plugin/type_inet/mysql-test/type_inet/type_inet4.test Add attribute-rejection tests for INET4.
plugin/type_inet/mysql-test/type_inet/type_inet4.result Expected results for INET4 attribute-rejection tests.
mysql-test/main/vector2.test Update VECTOR attribute error expectation to new error code.
mysql-test/main/vector2.result Expected results reflecting new VECTOR attribute error message.
include/my_xml.h Add a new parser flag constant intended for “assert well formed” behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/my_xml.h
Comment on lines 39 to 43
\r, \n, \t characters.
*/
#define MY_XML_FLAG_SKIP_TEXT_NORMALIZATION 2
#define MY_XML_FLAG_ASSERT_WELL_FORMED 4

Comment thread sql/item_xmlfunc.cc
Comment on lines +3075 to 3092
const Type_handler *Item_func_xml_update::xml_handler= NULL;


bool Item_func_xml_update::fix_length_and_dec(THD *thd)
{
static LEX_CSTRING name= {STRING_WITH_LEN("XMLTYPE")};

if (!xml_handler)
xml_handler= Type_handler::handler_by_name(thd, name);

return Item_xml_str_func::fix_length_and_dec(thd);
}


const Type_handler *Item_func_xml_update::type_handler() const
{
return xml_handler ? xml_handler : Item_xml_str_func::type_handler();
}
Comment on lines +3582 to +3586
if (!args[1]->const_item())
{
my_printf_error(ER_UNKNOWN_ERROR,
"Only constant XML Schema-s are supported.", MYF(0));
return TRUE;
Comment thread strings/xml.c
Comment on lines 478 to 484
if (!(p->flags & MY_XML_FLAG_SKIP_TEXT_NORMALIZATION))
my_xml_norm_text(&a);
if (a.beg != a.end)
{
my_xml_value(p,a.beg,(size_t) (a.end-a.beg));
if (MY_XML_OK != my_xml_value(p,a.beg,(size_t) (a.end-a.beg)))
return MY_XML_ERROR;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants