Volcano Optimizer generator

大概就是按照能看的懂的结构文章读了一遍。 这是一个框架,而不是一个具体的优化器,优化器需要通过这个框架来生成。并且这个框架为Optimizer Implementor提供了一定的灵活性,可以根据自己的需要,定义部分需要的数据结构比如Cost等,这样就可以很方便的为某个DBMS构建一个优化器。

Abstract

optmization generator:

  • Data model
  • logical algebra
  • physical algebra
  • optimization rules

用这四个就可以生成optimizer source code。

比起EXODUS,它更exentisble,powerful.

  • cost model for physical properties
  • dynamic programming with goal-directed search and branch-and-bound pruning.

Introduction

Volcano vs EXODUS improvement

  1. Volcano optgen可以在Volcano proj中使用,同时也可以作为一个stand-alone tool
  2. more efficient, 包括优化时间以及搜索plan所消耗的内存
  3. 对physical properties提供effective ,efficient, extensible的支持
  4. 支持heuristics搜索,或者data model 语义guide的搜索, 以及裁剪无效的搜索空间

  5. 支持更灵活的cost model ?允许生成dynamic plan for imcomplete specified queries

The outside view of the Volcano Optimizer Generator

从DBMS implementor或者说query optimizer implementor的角度来看 Volcano optgen opt_gen_overview 整个流程就是Optmizer Generator根据数据模型生成 Optmizer的源代码,然后这一部分源代码与DBMS的其他代码一起编译链接最后作为一个完整的DBMS。 之后普通的数据库用户使用的时候,Optimizer会把Query转换成Plan。

Design Principles

5 Fundamental design decisions

  1. logical algebra(query) -> physical algebra(plan consisting of algorithms)
  2. rules? modularity
  3. algebraic equivalences, 就是比 intermdiate levels先进,类似于SQL获取数据类比于程序员用for循环找数据一样。
  4. rules interpretation vs compilation. compilation运行更快
  5. search engine使用dynamic programming

Optimizer Generator Input and Optimizer Operation

这一小节将会介绍一下Optimizer的一些内容,以及为了实现这些内容Optimizer Implementor需要提供哪些东西给 Volcano Optimizer Gernerator

Query: an algebra expression(tree) of logical operators (AST)

optimiza map a logical algebra expression into the optimal equivalent physical algebra expression

transformation rules: logical algebra expression <-> logical algebra expression implementation rules: logical operators -> algorithms(physical )

properties: result of expressions

  • logical properties
  • physical properties

equivalence classes: sets of equivalent logical expressions and plans

physical property vector:用于保存中间的结果的properties的集合,由optimizer implementor来实现,对于opt gen来说这个就是Abstract Data Type

enforcer:本质是一operator。只是存在于physical algebra而与logical algebra没关系的一种operator。

optimization goal( subgoal) is a pair of (logical expressionphysical property vector

applicability function: whether or not an algorithm or enforcer can be used to execute the root node of a logical expression.

cost: 执行一个algrothm或者enforcer所需要的代价,由optimizer implementor 来实现。

property function: logical and physical properties are derived using property functions. There must be one property function for each logical operator , algorithm, enforcer.

  • logical properties are based on logical expresion, before any optimization is performed.
  • physical properties can only be determined after an execution plan has been chosen. optimizer verify the physical properties of a chosen plan really do satisfy the physical property vector

在本小节的最后小结一下需要Optimizer Implementor需要提供哪些东西给 optgen:

  1. a set of logical operators
  2. algebraic transformation rules possible with condition code.(logical) 目前还不太明白这个 condition code是什么的一个东西。可能可以看看opttoy的代码学习一下。
  3. a set of algorithms and enforcers (physical)
  4. implementation rules, possibly with condition code.
  5. an ADT “cost” ith functions for basic arithmetic and comparision
  6. an ADT “logical properties”
  7. an ADT “physical properties vector” including comparison functions( equality and cover)
  8. an aplicability function for each algorithm and enforcer.
  9. a cost function for each algorithm and enforcer
  10. a property function for each operator algorithm and enforcer.

以上的10点,就可以理解为前面的Outline里的 Model Specification。

The Search Engine

  1. dynamic programing里加了一些限制减小搜索空间 search engine and its algorithm are central components of any query optimizer. Volcano provided a search engine for generated optimizers.

dynamic programming and to be very goal-oriented, i.e., driven by needs rather than by possibilities.

只管那些是更大的subquery(或者整个query)的一部分的subquery,就可以减少搜索空间。 Volcano search algorithm uses backward chaining, it explores only those subqueries and plans that truly participate in a larger expression;

  1. 用hash table存起来了equivalent classes避免同样的东西被搜索多次。

FindBestPlan 原文是用的伪代码,但是哦更喜欢用python写伪代码 抄了一遍,看懂了,但没完全看懂。# TODO 等把Cascades也看完了,回头再改。

# look_up_table looks like:
# look_up_table = {"(LogExpr, PhysPro)": cost}

def FindBestPlan(logExpr, PhysPro, Limit):
    if (LogExpr, PhysPro) in look_up_table:
        if look_up_table[(LogExpr, PhysPro)] < Limit:
            return Plan, Cost # where does the Plan from?
        else:
            return failure
    else: # optimization required.
        pass
    moves = ["tranformation", "algorithm", "enforcer"]
    moves = sorted_by_promising_decs(moves)
    # moves[0] stand for the most promising move
    if moves[0] == "transformation":
        LogExpr = transformation(LogExpr)
        cost, plan = FindBestPlan(LogExpr, PhysPro, Limit)
    elif moves[0] == "algorithm":
        total_cost = cost("algorithm")
        for I in moves[0].inputs:
            while total_cost < Limit:
                PP = determine_required_physical_properties(I)
                _, cost = FindBestPlan(I, PP, Limit-TotalCost)
                total_cost += cost
    else: # moves[0] == "enforcer"
        total_cost = cost("enforcer")
        PhysPro = modify("enforcer")
        cost, plan = FindBestPlan(LogExpr, PhysPro, Limit-TotalCost)
    if logExpr not in look_up_table:
        look_up_table[(LogExpr, PhysPro)] = total_cost
    plan, cost = found_best_plan(look_up_table)
    return Plan, Cost

    

Reference

  1. The Volcano Optimizer Generator: Extensibility and efficient Search, Graefe,1993