Volcano Optimizer generator
大概就是按照能看的懂的结构文章读了一遍。 这是一个框架,而不是一个具体的优化器,优化器需要通过这个框架来生成。并且这个框架为Optimizer Implementor提供了一定的灵活性,可以根据自己的需要,定义部分需要的数据结构比如Cost等,这样就可以很方便的为某个DBMS构建一个优化器。
Abstract
optmization generator:
- Data model
- logical algebra
- physical algebra
- optimization rules
用这四个就可以生成optimizer source code。
比起EXODUS,它更exentisble,powerful.
- cost model for physical properties
- dynamic programming with goal-directed search and branch-and-bound pruning.
Introduction
Volcano vs EXODUS improvement
- Volcano optgen可以在Volcano proj中使用,同时也可以作为一个stand-alone tool
- more efficient, 包括优化时间以及搜索plan所消耗的内存
- 对physical properties提供effective ,efficient, extensible的支持
-
支持heuristics搜索,或者data model 语义guide的搜索, 以及裁剪无效的搜索空间
- 支持更灵活的cost model ?允许生成dynamic plan for imcomplete specified queries
The outside view of the Volcano Optimizer Generator
从DBMS implementor或者说query optimizer implementor的角度来看 Volcano optgen 整个流程就是Optmizer Generator根据数据模型生成 Optmizer的源代码,然后这一部分源代码与DBMS的其他代码一起编译链接最后作为一个完整的DBMS。 之后普通的数据库用户使用的时候,Optimizer会把Query转换成Plan。
Design Principles
5 Fundamental design decisions
- logical algebra(query) -> physical algebra(plan consisting of algorithms)
- rules? modularity
- algebraic equivalences, 就是比 intermdiate levels先进,类似于SQL获取数据类比于程序员用for循环找数据一样。
- rules interpretation vs compilation. compilation运行更快
- search engine使用dynamic programming
Optimizer Generator Input and Optimizer Operation
这一小节将会介绍一下Optimizer的一些内容,以及为了实现这些内容Optimizer Implementor需要提供哪些东西给 Volcano Optimizer Gernerator
Query: an algebra expression(tree) of logical operators (AST)
optimiza map a logical algebra expression into the optimal equivalent physical algebra expression
transformation rules: logical algebra expression <-> logical algebra expression implementation rules: logical operators -> algorithms(physical )
properties: result of expressions
- logical properties
- physical properties
equivalence classes: sets of equivalent logical expressions and plans
physical property vector:用于保存中间的结果的properties的集合,由optimizer implementor来实现,对于opt gen来说这个就是Abstract Data Type
enforcer:本质是一operator。只是存在于physical algebra而与logical algebra没关系的一种operator。
optimization goal( subgoal) is a pair of (logical expression, physical property vector)
applicability function: whether or not an algorithm or enforcer can be used to execute the root node of a logical expression.
cost: 执行一个algrothm或者enforcer所需要的代价,由optimizer implementor 来实现。
property function: logical and physical properties are derived using property functions. There must be one property function for each logical operator , algorithm, enforcer.
- logical properties are based on logical expresion, before any optimization is performed.
- physical properties can only be determined after an execution plan has been chosen. optimizer verify the physical properties of a chosen plan really do satisfy the physical property vector
在本小节的最后小结一下需要Optimizer Implementor需要提供哪些东西给 optgen:
- a set of logical operators
- algebraic transformation rules possible with condition code.(logical) 目前还不太明白这个 condition code是什么的一个东西。可能可以看看opttoy的代码学习一下。
- a set of algorithms and enforcers (physical)
- implementation rules, possibly with condition code.
- an ADT “cost” ith functions for basic arithmetic and comparision
- an ADT “logical properties”
- an ADT “physical properties vector” including comparison functions( equality and cover)
- an aplicability function for each algorithm and enforcer.
- a cost function for each algorithm and enforcer
- a property function for each operator algorithm and enforcer.
以上的10点,就可以理解为前面的Outline里的 Model Specification。
The Search Engine
- dynamic programing里加了一些限制减小搜索空间 search engine and its algorithm are central components of any query optimizer. Volcano provided a search engine for generated optimizers.
dynamic programming and to be very goal-oriented, i.e., driven by needs rather than by possibilities.
只管那些是更大的subquery(或者整个query)的一部分的subquery,就可以减少搜索空间。 Volcano search algorithm uses backward chaining, it explores only those subqueries and plans that truly participate in a larger expression;
- 用hash table存起来了equivalent classes避免同样的东西被搜索多次。
FindBestPlan 原文是用的伪代码,但是哦更喜欢用python写伪代码 抄了一遍,看懂了,但没完全看懂。# TODO 等把Cascades也看完了,回头再改。
# look_up_table looks like:
# look_up_table = {"(LogExpr, PhysPro)": cost}
def FindBestPlan(logExpr, PhysPro, Limit):
if (LogExpr, PhysPro) in look_up_table:
if look_up_table[(LogExpr, PhysPro)] < Limit:
return Plan, Cost # where does the Plan from?
else:
return failure
else: # optimization required.
pass
moves = ["tranformation", "algorithm", "enforcer"]
moves = sorted_by_promising_decs(moves)
# moves[0] stand for the most promising move
if moves[0] == "transformation":
LogExpr = transformation(LogExpr)
cost, plan = FindBestPlan(LogExpr, PhysPro, Limit)
elif moves[0] == "algorithm":
total_cost = cost("algorithm")
for I in moves[0].inputs:
while total_cost < Limit:
PP = determine_required_physical_properties(I)
_, cost = FindBestPlan(I, PP, Limit-TotalCost)
total_cost += cost
else: # moves[0] == "enforcer"
total_cost = cost("enforcer")
PhysPro = modify("enforcer")
cost, plan = FindBestPlan(LogExpr, PhysPro, Limit-TotalCost)
if logExpr not in look_up_table:
look_up_table[(LogExpr, PhysPro)] = total_cost
plan, cost = found_best_plan(look_up_table)
return Plan, Cost
Reference
- The Volcano Optimizer Generator: Extensibility and efficient Search, Graefe,1993