;-*-mode: text; mode: auto-fill-*-


BACKGROUND
==================================


In short : scache is an alternate approach to session caching on php
environment. scache provides tree-based storage for storing session
data in small parts. It quarantees to store session data in complete
or completely expired.



Long version : I started developing scache on 2008. Soon on that
autumn I got lots of other work-related things to complete, so I did
put it on hold for over two years. Now on 2011 I woked it up and ended
in big problems remembering how it worked and at what state I left it.

Current status of scache is somewhere between alfa and beta. Basically
it works for me, but I'm not in any way sure whether it works or will
ever work for anybody else. If you are planning to test it, you should
have some knownledge in debugging and fixing problems on unix-styled
code. 

  - If you plan to debug, compile it with --enable-devel so that 
    it runs standalone under gdb from source directory. 
    Especially have fun with io-loop :)



Why (the motivation)
----------------------------------


Scache originates from urge to run effectively multiple identical
www-frontend servers on shared round-robin server url where browser
can freely hop from frontend to frontend on subsequent requests. That
did lead problems with sharing session data between those frontends.


Common way on handling sessions is to use php's $_SESSION-variable. On
a bigger environment people use mysql-backed memcached or some other
similar solutions to overcome some performance problems.

I once tried to fit my lousy wep-apps to use $_SESSION, but didn't get
it to fit on in the way I felt sensible. I ended thinking in every
place, "should I store that large array to session", knowing that
after I store it there, it will be loaded from cache to every page
whether it is needed or not. Or "should I reload it from database"
every time, and on that way make lots of unnecessary burden to
database backend. I have fetched the result set from db before, why
should I do it again?

$_SESSION is not a good place to store data. It's one single unit that
gets reloaded for every page request even if you need anything from
there. So don't make it too large.


One solution is to use memcached, but problem is that it's cache. It
doesn't quarantee to contain everything you put in there. If it's out
of space, it drops part of it's contents off and you can't control
what it will drop.

So you eventually need to store data to two places. Store it once to
the cache from where you can fast-access it and from where it will
probably in almost every case found from, but as a backup also to
somewhere else, most probably to mysql, where you can find it in a
rare case where memcached has dropped it. 

But again, why should it be stored twice for just in case.

(Of course you can store everything on single key, and bravely omit
redundat storing assuming that if that key is gone, session is
gone. But that leads back to previous paragraphs notes about storing
and fetching also data unrelated to current request, and eventually to
compromising again whether I store on already full single cache-key or
re-fetch it from database.)


One way is to send data to client and expect it to post it back, but
on that method you end up re-checking (most hopefully you do check)
very extensively that received input you just sent to client, to verify
it had not suddenly changed to attack code. Also it's unnecessary
bandwith consumption to send data forth and back.

How I would like to do all of it, is that I would like to fetch
result-set to be operated from database, store it locally to server to
avoid database-stressing, send index to client. To omit unnecessary
input validation I would expect (int)index from client, get the right
row from server stored result-set, and operate on it.


I ended to omit the whole $_SESSION and I solve the complete problem
with two simple functions.


   function &session_load($tag) {
        if (($s = @file("$myroot/$session/cache/$tag")))
            return unserialize(join('', $s));
    }
    
   function session_store($tag, &$a) {
        $s = serialize($a);
        $p = posix_getpid();
        $n = "$myroot/$session/cache/$tag";

        if (!(($f = fopen("$n.$p", 'w')) &&
              (fwrite($f, $s) == strlen($s)) &&
              (rename("$n.$p", "$n")) &&
              (fclose($f)))) do_panic_hard();
    }


With these i can session_store('my_specific_pages_data', $data) and
session_load('my_specific_pages_data') only where necessary. Of cource
because it's not a native handler, it has some performace impact, but
on a other hand on this way I can store as much I want and load it
only when required.


But this works well only as long there's single frontend server. When
running with multiple frontends, file backed session storage is much
harder to implement and requires practically shared filesystem like
OCFS2. This kind of configuration unfortunately tends to be bit
heavyweighted, because of all filesystem locking and redundancy
between nodes. Particularly is not heavyweighted because of
webservices requirements, but because using shared filesystem for that
is wrong way to do.


So I wrote scache.


SCACHE
==================================


Scache has following principles :

- All data that is inserted, is kept in until destroyed by clients
request (including session expiration)

- On error situation, session is expired (unless requested otherwise)
in complete

- Data organization resembles filesystem, there are keys that can be
organized in subtrees, and whole collected subtrees can be destroyed
for example when whole section gets invalidated. Network accessible
windows registry could be most closest equivalent.

- Data is intended to accessed in small parts, so that only
immediately necessary data is to be fetched. There's commands for
single key operations and also scache_iov() for very fast multiple
operations in one request.

- Operations must be fast. Internally scache can process couple
hundred thousand request per second even with large session trees, but
network and socket latencies drop that to some 10000-20000 operations
per second. However multiple clients can be served with that rate, so
you can easily assume 50000-100000 operations/second total rate.

- Scached is non-threaded, every request is processed in order
ie. every request are atomic. Multi-requests from scache_iov() are
also processed in complete, before other client is serverd.



WHAT TO DO NEXT?
==================================

- there's documentation and examples on /docs -directory
- there's same documentation and examples on address

    http://scache.nanona.fi/

Current status for scache is that it works for me and I don't get it
to crash any longer, but I have no evidence that it works for anyone
else. Please hint me at scache@nanona.fi if you test this and
encounter some bugs or have some other comments.


The most urgent TODO's for me are :

- documentation practically doesn't exist and if it exists it's
probably understandable and full of typos and solecisms.

- fix bugs

