首页 > 数据库 > MySQL

【MySQL 源码】UNION 比 UNION ALL 的性能差很多吗?

来源：SegmentFault

时间：2023-01-11 10:57:19 339浏览收藏

在数据库实战开发的过程中，我们经常会遇到一些这样那样的问题，然后要卡好半天，等问题解决了才发现原来一些细节知识点还是没有掌握好。今天golang学习网就整理分享《【MySQL 源码】UNION 比 UNION ALL 的性能差很多吗?》，聊聊MySQL、源码、MySQL优化、C++、源码分析，希望可以帮助到正在努力赚钱的你。

原文地址: 【MySQL 源码】UNION 比 UNION ALL 的性能差很多吗?

欢迎访问我的个人博客: http://blog.duhbb.com/

引言

本文从源码角度分析了一下 MySQL 中 union 和 union all 的区别;得出了以下结论: union 和 union all 都会创建临时表, 但是又不太一样; 二者的查询计划不一样;union 默认会创建一个以返回列作为 key 的临时表, 所谓过滤就是将数据插入这个临时表; 临时表装数据的容器实际上是一个 unordered_set; 有一种存储引擎叫做临时表; union all 则是直接读取表的数据并返回给客户端, 不走临时表; union all 和 union 的场景还是得根据需要来判断, 如果没有 distinct 的需求话, 数据又不多, 可以考虑使用 union all.

Union 和 Union All 的区别

Union 和 Union All 之间的唯一区别是 Union All 不会删除重复的行或记录, 而是从所有表中选择满足您的具体查询条件的所有行并将它们组合到结果表中.

UNION 不适用于具有文本数据类型的列. 而 UNION ALL 适用于所有数据类型列.

MySQL 官方介绍

MySQL 官方文档在介绍 12.5 Non-Subquery UNION Execution 是这么说的:

非子查询联合 (non-subquery unions) 是在

void Query_expression::create_access_paths(THD *thd) {

  // 确定我们是否可以流式读取行, 即永远不需要将它们放到临时表中
  // 如果可以的话, 我们会首先物化 UNION DISTINCT blocks, 然后将剩余任何 UNION ALL block
  // 通过 AppendIterator 追加.
  //
  // 如果不能流式的话, 即每个块都必须进入临时表
  // 我们对于混合的 UNION ALL/DISTINCT 的策略有点不同
  // 详情见 MaterializeIterator.
  bool streaming_allowed = true;
  if (global_parameters()->order_list.size() != 0) {
    // If we're sorting, we currently put it in a real table no matter what.
    // This is a legacy decision, because we used to not know whether filesort
    // would want to refer to rows in the table after the sort (sort by row ID).
    // We could probably be more intelligent here now.
    streaming_allowed = false;
  }


  // 省略前面

  // 如果允许流式查询, 那么我们可以对 UNION ALL 的每个部分都做流式查询,
  // 而其他情况则都需要用到临时表.
  //
  // 处理我们需要物化的所有的 query block.
  // 这个可能是 UNION DISTINCT 的 query block 或者所有的 block.

  if (union_distinct != nullptr || ! streaming_allowed) {
    Mem_root_array<:queryblock> query_blocks =
        setup_materialization(thd, tmp_table, streaming_allowed);

  // 省略后面

光看代码感觉一头雾水, 还是 debug 一下吧.

  if (! simple_query_expression) {
    /*
      Check that it was possible to aggregate all collations together for UNION.
      We need this in case of UNION DISTINCT, to filter out duplicates using
      the proper collation.

      TODO: consider removing this test in case of UNION ALL.
    */
    for (Item *type : types) {
      if (type->result_type() == STRING_RESULT &&
          type->collation.derivation == DERIVATION_NONE) {
        my_error(ER_CANT_AGGREGATE_NCOLLATIONS, MYF(0), "UNION");
        return true;
      }
    }
    ulonglong create_options =
        first_query_block()->active_options() | TMP_TABLE_ALL_COLUMNS;

    if (union_result->create_result_table(thd, types, union_distinct != nullptr,
                                          create_options, "", false,
                                          instantiate_tmp_table))

这里执行的语句是:

select * from student union select * from student;

可以看到这里确实创建了临时表, 是在

bool Query_expression::prepare(THD *thd, Query_result *sel_result,

创建临时表调用的是这个方法:

/**
  Create a temp table according to a field list.

  Given field pointers are changed to point at tmp_table for
  send_result_set_metadata. The table object is self contained: it's
  allocated in its own memory root, as well as Field objects
  created for table columns. Those Field objects are common to TABLE and
  TABLE_SHARE.
  This function will replace Item_sum items in 'fields' list with
  corresponding Item_field items, pointing at the fields in the
  temporary table, unless save_sum_fields is set to false.
  The Item_field objects are created in THD memory root.

  @param thd                  thread handle
  @param param                a description used as input to create the table
  @param fields               list of items that will be used to define
                              column types of the table (also see NOTES)
  @param group                Group key to use for temporary table, NULL if
  none
  @param distinct             should table rows be distinct
  @param save_sum_fields      see NOTES
  @param select_options
  @param rows_limit
  @param table_alias          possible name of the temporary table that can
                              be used for name resolving; can be "".

  @remark mysql_create_view() checks that views have less than
          MAX_FIELDS columns.

  @remark We may actually end up with a table without any columns at all.
          See comment below: We don't have to store this.
*/

#define STRING_TOTAL_LENGTH_TO_PACK_ROWS 128
#define AVG_STRING_LENGTH_TO_PACK_ROWS 64
#define RATIO_TO_PACK_ROWS 2

TABLE *create_tmp_table(THD *thd, Temp_table_param *param,
                        const mem_root_deque &fields, ORDER *group,
                        bool distinct, bool save_sum_fields,
                        ulonglong select_options, ha_rows rows_limit,
                        const char *table_alias) {

其中有一段代码是这样的:

  } else if (distinct && share->fields != param->hidden_field_count) {
    /*
      Create an unique key or an unique constraint over all columns
      that should be in the result.  In the temporary table, there are
      'param->hidden_field_count' extra columns, whose null bits are stored
      in the first 'hidden_null_pack_length' bytes of the row.
    */
    DBUG_PRINT("info", ("hidden_field_count: %d", param->hidden_field_count));
    share->keys = 1;
    share->is_distinct = true;
    if (! unique_constraint_via_hash_field) {
      param->keyinfo->table = table;
      param->keyinfo->is_visible = true;
      param->keyinfo->user_defined_key_parts =
          share->fields - param->hidden_field_count;
      param->keyinfo->actual_key_parts = param->keyinfo->user_defined_key_parts;
      KEY_PART_INFO *key_part_info = share->mem_root.ArrayAlloc(
          param->keyinfo->user_defined_key_parts);
      if (key_part_info == nullptr) return nullptr;
      param->keyinfo->key_part = key_part_info;
      param->keyinfo->flags = HA_NOSAME | HA_NULL_ARE_EQUAL;
      param->keyinfo->actual_flags = param->keyinfo->flags;
      param->keyinfo->name = "【MySQL 文档翻译】理解查询计划
总结
union 和 union all 都会创建临时表, 但是又不太一样
二者的查询计划不一样
union 默认会创建一个以返回列作为 key 的临时表, 所谓过滤就是将数据插入这个临时表
临时表装数据的容器实际上是一个 unordered_set
有一种存储引擎叫做临时表
union all 则是直接读取表的数据并返回给客户端, 不走临时表
union all 和 union 的场景还是得根据需要来判断, 如果没有 distinct 的需求话, 数据又不多, 可以考虑使用 union all
原文地址: 【MySQL 源码】UNION 比 UNION ALL 的性能差很多吗?
欢迎访问我的个人博客: http://blog.duhbb.com/
以上就是《【MySQL 源码】UNION 比 UNION ALL 的性能差很多吗?》的详细内容，更多关于mysql的资料请关注golang学习网公众号！

mysql MySQL 源码 MySQL优化 C++ 源码分析

声明：本文转载于：SegmentFault 如有侵犯，请联系study_golang@163.com删除