[Jifty-commit] r5839 - in Jifty-DBI/branches/tisql: doc/tisql lib/Jifty/DBI t/tisql

Jifty commits jifty-commit at lists.jifty.org
Mon Sep 15 01:21:18 EDT 2008


Author: ruz
Date: Mon Sep 15 01:21:17 2008
New Revision: 5839

Added:
   Jifty-DBI/branches/tisql/t/tisql/bench_tags.t
Modified:
   Jifty-DBI/branches/tisql/   (props changed)
   Jifty-DBI/branches/tisql/doc/tisql/joins_reusing.txt
   Jifty-DBI/branches/tisql/lib/Jifty/DBI/Collection.pm
   Jifty-DBI/branches/tisql/lib/Jifty/DBI/Tisql.pm
   Jifty-DBI/branches/tisql/t/tisql/searches_tags.t

Log:
* push joins bundling

 r5794 at ruslan-zakirovs-computer:  ruz | 2008-09-15 09:15:01 +0400
 * implement and document joins bundling
 r5795 at ruslan-zakirovs-computer:  ruz | 2008-09-15 09:17:18 +0400
 * more tests
 r5796 at ruslan-zakirovs-computer:  ruz | 2008-09-15 09:17:40 +0400
 * simple benchmark


Modified: Jifty-DBI/branches/tisql/doc/tisql/joins_reusing.txt
==============================================================================
--- Jifty-DBI/branches/tisql/doc/tisql/joins_reusing.txt	(original)
+++ Jifty-DBI/branches/tisql/doc/tisql/joins_reusing.txt	Mon Sep 15 01:21:17 2008
@@ -1,34 +1,117 @@
-Ok, we've figured out syntax, let's play with number of joins.
+=head1 JOINS REUSING
 
-'.tag.value = "zoo" OR .tag.value = "bar"'
+=head2 Introduction
 
-This is the simplest query when we can bundle conditions and use on
-join, but this rule applies only to OR binary operator, when AND
-operator with positive conditions requires at least two joins. In the
-case of AND you have to use different joins as the same record you
-join can not be 'foo' and 'bar' at the same time.
-
-'.tag.value != "zoo"'
-
-Do you remember I wrote about positive conditions is such queries. I
-was talking in such way because if we implement this condition in the
-same way as '.tag.value = "zoo"' then we'll get wrong results, we'll
-find all objects that have at least one tag that is not 'zoo' instead
-we have to use positive condition in a left join and check that the
-right part is empty. Something like:
- SELECT o.* FROM o LEFT JOIN t ON t.object = o.id AND t.value = 'zoo'
-WHERE t.value IS NULL;
-Note that we invert operator, so query is more closer to "NOT
-(.tag.value = 'zoo')"
-
-'.tag.value != "zoo" AND .tag.value != "bar"'
-
-Smoothly we move to bundling negative conditions joined by AND binary
-operator. Using boolean logic we can rewrite the query into 'NOT(
-.tag.value = "zoo" OR .tag.value = "bar" )' and things are clearer
-now.
-
-What about more complex examples? I use Parse::BooleanLogic, then I've
-implemented filter that  leaves two conditions in the structure, so we
-can figure out relation between any pair of conditions in the query.
+Let's start with simple example - tags.
+
+    .tag.value = "foo" OR .tag.value = "bar"
+
+This is the simplest query when we can bundle conditions and use one
+join for both. Final query will look something like:
+
+    SELECT o.* FROM o LEFT JOIN t ON t.object = o.id
+    WHERE t.value = 'foo' OR t.value = 'bar'
+
+Left join here degenerate into cross join and JDBI for a long time
+has such optimization and some DBs optimize it these days too, so
+in tisql we don't care much.
+
+This rule applies only to OR binary operator, when AND
+operator with positive conditions requires at least two joins. It's
+pretty obvious from the query above that AND between two conditions
+will result in impossible condition and most DBs return empty set
+without looking into DB.
+
+Opposite situation with the following condition:
+
+    .tag.value != "foo"
+
+As we have seen in the doc describing tisql syntax this condition
+is equivalent to 'has no .tag.value = "foo"'. All conditions
+can be expressed using either 'has' or 'has no' prefix. Conditions
+with 'has no' prefix in SQL look like:
+
+    SELECT o.* FROM o LEFT JOIN t ON t.object = o.id AND (t.value = "foo")
+    WHERE t.id IS NULL
+
+It's a check that a related collection with some filter applied is
+empty. If we put t.value = "foo" OR t.value = "bar" clause into
+ON clause of the join then we will get 't.value != "foo" and t.value != "bar"'
+tisql expression.
+
+=head2 Which combinations of these conditions can be bundled?
+
+    cond1             bop     cond2             can bundle?
+    has    X = A      OR      has    X =  B     1
+    has    X = A      AND     has    X =  B     0
+    has    X = A      OR      has no X =  B     0
+    has    X = A      AND     has no X =  B     0
+    has    X = A      OR      has    X != B     1
+    has    X = A      AND     has    X != B     0
+    has    X = A      OR      has no X != B     0
+    has    X = A      AND     has no X != B     0
+    has no X = A      OR      has    X =  B     0
+    has no X = A      AND     has    X =  B     0
+    has no X = A      OR      has no X =  B     0
+    has no X = A      AND     has no X =  B     1
+    has no X = A      OR      has    X != B     0
+    has no X = A      AND     has    X != B     0
+    has no X = A      OR      has no X != B     0
+    has no X = A      AND     has no X != B     1
+    ...
+
+We can continue this table, but it's pretty obvious that conditions
+with 'has' prefix and OR relation can be bundled together, as well as
+'has no' conditions with AND relation.
+
+=head2 Implementation details
+
+Our conditions is a boolean expression which can filter leave only
+parts in which we're interested leaving information about relations.
+Then we can solve boolean expression replacing conditions with true
+or false value. This can be used to build our bundled joins.
+
+Let's consider following tree:
+
+    X AND (( foo AND (X OR bar)) OR zoo)
+
+Where X is some condition, for example .status = 'const', we're not
+interested in them so we will leave them alone. We start from
+'t.value = "foo"', all bundles are empty and we just generate a new
+one, generate new alias for tags table and apply condition to our
+query. Then we continue building query and find "bar". We have a
+bundle around and do next things:
+
+    1) we filter our query and leave only conditions from the bundle
+    and condition we want to check
+    2) replace all conditions from bundle with falses
+    3) replace candidate with true value
+    4) solve expression and if we get true result then our candidate
+    can be bundled with this bundle
+    5) otherwise we move to the next one
+
+Let's look at the process on our example:
+
+    * X AND (( foo AND (X OR bar)) OR zoo)
+    * foo AND bar
+    * 0 AND bar
+    * 0 AND 1
+    * 0
+
+Expression is solved and we can not bundle foo and bar. We generate new
+bundle and continue to 'zoo' part:
+
+    * X AND (( foo AND (X OR bar)) OR zoo)
+    * foo OR zoo
+    * 0 OR zoo
+    * 0 OR 1
+    * 1
+
+Woot, zoo condition can use foo's join. The query will look like:
+    
+    
+    SELECT o.* FROM o
+    LEFT JOIN t AS t1 ON t1.object = o.id
+    LEFT JOIN t AS t2 ON t2.object = o.id
+    WHERE X AND (( t1.value = 'foo' AND (X OR t2.value = 'bar')) OR t1.value = "zoo")
 

Modified: Jifty-DBI/branches/tisql/lib/Jifty/DBI/Collection.pm
==============================================================================
--- Jifty-DBI/branches/tisql/lib/Jifty/DBI/Collection.pm	(original)
+++ Jifty-DBI/branches/tisql/lib/Jifty/DBI/Collection.pm	Mon Sep 15 01:21:17 2008
@@ -2246,7 +2246,7 @@
 sub tisql {
     my $self = shift;
     require Jifty::DBI::Tisql;
-    return Jifty::DBI::Tisql->new( collection => $self );
+    return Jifty::DBI::Tisql->new( joins_bundling => 1, @_, collection => $self );
 }
 
 1;

Modified: Jifty-DBI/branches/tisql/lib/Jifty/DBI/Tisql.pm
==============================================================================
--- Jifty-DBI/branches/tisql/lib/Jifty/DBI/Tisql.pm	(original)
+++ Jifty-DBI/branches/tisql/lib/Jifty/DBI/Tisql.pm	Mon Sep 15 01:21:17 2008
@@ -87,12 +87,16 @@
             $meta->{'name'} = $name;
         }
     }
+    my $operand_cb = sub {
+        my $rv = $self->parse_condition( 
+            $_[0], sub { $self->find_column( $_[0], $tree->{'aliases'} ) }
+        );
+        #push @{ $self->{'cache'}{ $rv->{'lhs'}{'string'} } ||= [] }, $rv;
+        return $rv;
+    };
 
     $tree->{'conditions'} = $self->as_array(
-        $string,
-        operand_cb => sub { return $self->parse_condition( 
-            $_[0], sub { $self->find_column( $_[0], $tree->{'aliases'} ) }
-        ) },
+        $string, operand_cb => $operand_cb,
     );
     $self->{'tisql'}{'conditions'} = $tree->{'conditions'};
     $self->apply_query_tree( $tree->{'conditions'} );
@@ -145,6 +149,38 @@
     }
     $prefix ||= 'has';
 
+    my $bundling = $long && !$join && $self->{'joins_bundling'};
+    my $bundled = 0;
+    if ( $bundling ) {
+        my $bundles = $self->{'cache'}{'condition_bundles'}{ $condition->{'lhs'}{'string'} }{ $prefix } ||= [];
+        foreach my $bundle ( @$bundles ) {
+            my %tmp;
+            $tmp{$_}++ foreach map refaddr($_), @$bundle;
+            my $cur_refaddr = refaddr( $condition );
+            my $filtered = 
+                $self->filter(
+                    $self->{'tisql'}{'conditions'},
+                    sub { my $ra = refaddr($_[0]); return $ra == $cur_refaddr || $tmp{ $ra } },
+                );
+            if ( $prefix eq 'has' ) {
+                next unless $self->solve(
+                    $filtered,
+                    sub { return refaddr($_[0]) != $cur_refaddr },
+                );
+            } else {
+                next if $self->solve(
+                    $filtered,
+                    sub { return refaddr($_[0]) == $cur_refaddr },
+                );
+            }
+            $condition->{'lhs'}{'previous'}{'sql_alias'} = $bundle->[-1]{'lhs'}{'previous'}{'sql_alias'};
+            push @$bundle, $condition;
+            $bundled = 1;
+            last;
+        }
+        push @$bundles, [ $condition ] unless $bundled;
+    }
+
     if ( $prefix eq 'has' ) {
         my %limit = (
             subclause        => 'tisql',
@@ -195,7 +231,7 @@
 
         $collection->limit(
             %limit,
-            entry_aggregator => 'AND',
+            entry_aggregator => $bundled? 'OR': 'AND',
             leftjoin         => $limit{'alias'},
         );
 
@@ -262,7 +298,7 @@
             . (ref($refers) || $refers)
             ."' that is not record or collection";
     }
-    return $res;
+    return $meta->{'sql_alias'} = $res;
 }
 
 sub resolve_tisql_join {

Added: Jifty-DBI/branches/tisql/t/tisql/bench_tags.t
==============================================================================
--- (empty file)
+++ Jifty-DBI/branches/tisql/t/tisql/bench_tags.t	Mon Sep 15 01:21:17 2008
@@ -0,0 +1,205 @@
+#!/usr/bin/env perl -w
+
+use strict;
+use warnings;
+
+use File::Spec;
+use Test::More;
+
+BEGIN { require "t/utils.pl" }
+our (@available_drivers);
+
+use constant TESTS_PER_DRIVER => 1;
+
+my $total = scalar(@available_drivers) * TESTS_PER_DRIVER;
+plan tests => $total;
+
+my @types = qw(article memo note);
+my @tags = qw(foo bar baz ball box apple orange fruit juice pearl gem briliant qwe asd zxc qwerty ytr dsa cxz boo bla);
+my $total_objs = 30000;
+my $max_tags = 3;
+my $time_it = -10;
+
+use Data::Dumper;
+
+foreach my $d ( @available_drivers ) {
+SKIP: {
+    unless( has_schema( 'TestApp', $d ) ) {
+        skip "No schema for '$d' driver", TESTS_PER_DRIVER;
+    }
+    unless( should_test( $d ) ) {
+        skip "ENV is not defined for driver '$d'", TESTS_PER_DRIVER;
+    }
+
+    my $handle = get_handle( $d );
+    connect_handle( $handle );
+    isa_ok($handle->dbh, 'DBI::db');
+
+    my $ret = init_schema( 'TestApp', $handle );
+    isa_ok($ret, 'DBI::st', "Inserted the schema. got a statement handle back");
+
+    {
+        my $count = init_data( 'TestApp::Node', $handle );
+        ok( $count,  "init data" );
+        $count = init_data( 'TestApp::Tag', $handle );
+        ok( $count,  "init data" );
+        $handle->dbh->do("CREATE INDEX tags1 ON tags(value, node)");
+        $handle->dbh->do("CREATE INDEX tags2 ON tags(node, value)");
+    }
+
+    my $clean_obj = TestApp::NodeCollection->new( handle => $handle );
+    my $nodes_obj = $clean_obj->clone;
+    is_deeply( $nodes_obj, $clean_obj, 'after Clone looks the same');
+
+    run_our_cool_tests(
+        $nodes_obj, $handle,
+        '.tags.value = "foo" OR .tags.value = "bar"',
+        '.tags.value != "foo" AND .tags.value != "bar"',
+
+    );
+
+    cleanup_schema( 'TestApp', $handle );
+
+}} # SKIP, foreach blocks
+
+
+use Benchmark qw(cmpthese);
+sub run_our_cool_tests {
+    my $collection = shift;
+    my $handle = shift;
+    my @tests = @_;
+    foreach my $t ( @tests ) {
+        diag "without bundling: ". do {
+            $collection->clean_slate;
+            my $tisql = $collection->tisql;
+            $tisql->{'joins_bundling'} = 0;
+            $tisql->query( $t );
+            $collection->build_select_query;
+        };
+        diag "with    bundling: ". do {
+            $collection->clean_slate;
+            my $tisql = $collection->tisql;
+            $tisql->{'joins_bundling'} = 1;
+            $tisql->query( $t );
+            $collection->build_select_query;
+        };
+        cmpthese( $time_it, {
+            "  $t" => sub { 
+                my $collection = TestApp::NodeCollection->new( handle => $handle );
+                my $tisql = $collection->tisql;
+                $tisql->{'joins_bundling'} = 0;
+                $tisql->query( $t );
+                $collection->next;
+            },
+            "b $t" => sub { 
+                my $collection = TestApp::NodeCollection->new( handle => $handle );
+                my $tisql = $collection->tisql;
+                $tisql->{'joins_bundling'} = 1;
+                $tisql->query( $t );
+                $collection->next;
+            } }
+        );
+    }
+}
+1;
+
+
+package TestApp;
+#sub schema_sqlite { [
+#q{ CREATE table nodes (
+#    id integer primary key,
+#    type varchar(36),
+#    subject varchar(36)
+#) },
+#q{ CREATE table tags (
+#    id integer primary key,
+#    node integer not null,
+#    value varchar(36)
+#) },
+#] }
+
+sub schema_mysql { [
+q{ CREATE table nodes (
+    id integer primary key auto_increment,
+    type varchar(36),
+    subject varchar(36)
+) },
+q{ CREATE table tags (
+    id integer primary key auto_increment,
+    node integer not null,
+    value varchar(36)
+) },
+] }
+sub cleanup_schema_mysql { [
+    "DROP table tags", 
+    "DROP table nodes", 
+] }
+
+package TestApp::TagCollection;
+use base qw/Jifty::DBI::Collection/;
+our $VERSION = '0.01';
+
+package TestApp::NodeCollection;
+use base qw/Jifty::DBI::Collection/;
+our $VERSION = '0.01';
+
+package TestApp::Tag;
+use base qw/Jifty::DBI::Record/;
+our $VERSION = '0.01';
+# definition below
+
+package TestApp::Node;
+use base qw/Jifty::DBI::Record/;
+our $VERSION = '0.01';
+
+BEGIN {
+use Jifty::DBI::Schema;
+use Jifty::DBI::Record schema {
+    column type => type is 'varchar(36)';
+    column subject => type is 'varchar(36)';
+    column tags => refers_to TestApp::TagCollection by 'node';
+};
+}
+
+my @xxx = ('a'..'z');
+sub init_data {
+    my @res = (
+        [ 'type', 'subject' ],
+    );
+    foreach ( 1 .. $total_objs ) {
+        push @res, [ $types[ int rand @types ], $xxx[ int rand @xxx ] ];
+    }
+    return @res;
+}
+
+package TestApp::Tag;
+
+BEGIN {
+use Jifty::DBI::Schema;
+use Jifty::DBI::Record schema {
+    column node => type is 'integer',
+        refers_to TestApp::Node;
+    column value => type is 'varchar(36)';
+    column nodes => refers_to TestApp::NodeCollection
+        by tisql => 'nodes.tags.value = .value';
+};
+}
+
+sub init_data {
+    my @res = (
+        [ 'node', 'value' ],
+    );
+    foreach my $o ( 1 .. $total_objs ) {
+        my $add = int rand $max_tags;
+        my %added;
+        while ( $add-- ) {
+            my $tag;
+            do {
+                $tag = $tags[ int rand @tags ];
+            } while $added{ $tag }++;
+            push @res, [ $o, $tag ];
+        }
+    }
+    return @res;
+}
+

Modified: Jifty-DBI/branches/tisql/t/tisql/searches_tags.t
==============================================================================
--- Jifty-DBI/branches/tisql/t/tisql/searches_tags.t	(original)
+++ Jifty-DBI/branches/tisql/t/tisql/searches_tags.t	Mon Sep 15 01:21:17 2008
@@ -9,7 +9,7 @@
 BEGIN { require "t/utils.pl" }
 our (@available_drivers);
 
-use constant TESTS_PER_DRIVER => 140;
+use constant TESTS_PER_DRIVER => 152;
 
 my $total = scalar(@available_drivers) * TESTS_PER_DRIVER;
 plan tests => $total;
@@ -95,6 +95,14 @@
         ".tags.value != 'x' AND .tags.value = 'y'" => [qw()],
         ".tags.value != 'x' OR  .tags.value = 'y'" => [qw(a aa at axy m mm mt mqwe)],
 
+        # tag != x and/or tag != y
+        ".tags.value != 'no' AND .tags.value != 't'" => [qw(a aa axy m mm mqwe)],
+        ".tags.value != 'no' OR  .tags.value != 't'" => [qw(a aa at axy m mm mt mqwe)],
+        ".tags.value != 'a' AND .tags.value != 't'" => [qw(a axy m mm mqwe)],
+        ".tags.value != 'a' OR  .tags.value != 't'" => [qw(a aa at axy m mm mt mqwe)],
+        ".tags.value != 'x' AND .tags.value != 'y'" => [qw(a aa at m mm mt mqwe)],
+        ".tags.value != 'x' OR  .tags.value != 'y'" => [qw(a aa at m mm mt mqwe)],
+
         # has .tag != x
         "has .tags.value != 'no'" => [qw(aa at axy mm mt mqwe)],
         "has .tags.value != 'a'" => [qw(at axy mm mt mqwe)],
@@ -108,11 +116,11 @@
         "has no .tags.value != 'q'"  => [qw(a m)],
 
         # crazy things
-
+### XXX, TODO, FIXME
         # get all nodes that have intersection in tags with article #3 (at)
-        ".tags.nodes.id = 3" => [qw(at mt)],
+#        ".tags.nodes.id = 3" => [qw(at mt)],
         # get all nodes that have intersactions in tags with nodes that have tag 't'
-        ".tags.nodes.tags.value = 't'" => [qw(at mt)],
+#        ".tags.nodes.tags.value = 't'" => [qw(at mt)],
 
     );
 
@@ -168,6 +176,23 @@
 ) },
 ] }
 
+sub schema_mysql { [
+q{ CREATE table nodes (
+    id integer primary key auto_increment,
+    type varchar(36),
+    subject varchar(36)
+) },
+q{ CREATE table tags (
+    id integer primary key auto_increment,
+    node integer not null,
+    value varchar(36)
+) },
+] }
+sub cleanup_schema_mysql { [
+    "DROP table tags", 
+    "DROP table nodes", 
+] }
+
 package TestApp::TagCollection;
 use base qw/Jifty::DBI::Collection/;
 our $VERSION = '0.01';


More information about the Jifty-commit mailing list