Cassandra~
1. Cassandra replaces the value to be deleted with a special value called a tombstone in case of other nodes are down so that they can't receive the deletion request.
2. delete the tombstone after GCGraceSeconds. So, a node is down for a longer than GCGraceSeconds can't get tombstone updates propagation. You need to remove the node.
http://wiki.apache.org/cassandra/DistributedDeletes
funny stuff
let me see.
Friday, January 13, 2012
Thursday, December 16, 2010
c++ exceptions and memory leaks
I had to come up with a way to deal with erratic and exceptional condition by C++ exception mechanism.
First I refereed to Google C++ coding convention. They say it is not good to use C++ exception with legacy codes which don't use it.
One of reasons is that it causes resource leaks. see the following code.
Here, h() calls g(), and g() calls f(). The problems is when f() throws an exception, if g() doesn't prepare this situation by RAII or others, a memory is leaked in g().
g() has to be very careful that it cleans up every resources it created in a stack unwinding.
void f() throw(myexception)
{
cout << "f() called" <<>
throw myex;
};
void g()
{
int* leak = new int[10];
cout << "g() called" <<>
f();
delete[] leak;
}
void h()
{
cout << "h() called" <<>
try {
g();
}
catch (exception& e)
{
cout <<>
}
}
valgrind --tool=memcheck --leak-check=yes ./a.out
==11647==
==11647== HEAP SUMMARY:
==11647== in use at exit: 40 bytes in 1 blocks
==11647== total heap usage: 2 allocs, 1 frees, 176 bytes allocated
==11647==
==11647== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
==11647== at 0x4C27939: operator new[](unsigned long) (vg_replace_malloc.c:305)
==11647== by 0x400DAF: g() (in /home/nhn/workspace/study/c++/exception/a.out)
==11647== by 0x400E15: h() (in /home/nhn/workspace/study/c++/exception/a.out)
==11647== by 0x400E93: main (in /home/nhn/workspace/study/c++/exception/a.out)
==11647==
==11647== LEAK SUMMARY:
==11647== definitely lost: 40 bytes in 1 blocks
==11647== indirectly lost: 0 bytes in 0 blocks
==11647== possibly lost: 0 bytes in 0 blocks
==11647== still reachable: 0 bytes in 0 blocks
==11647== suppressed: 0 bytes in 0 blocks
When you use RAII patterns, there will be no memory leak.
class MyRAII
{
public:
MyRAII()
{
cout << "MyRAII() called" <<> m_data = new int[10];
}
~MyRAII()
{
cout << "~MyRAII() called" <<>
delete[] m_data;
}
private:
int* m_data;
};
void g()
{
int* leak = new int[10];
MyRAII myRaii; // no leak
cout << "g() called" << endl
f();
delete[] leak;
}
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml?showone=Exceptions#Exceptions
http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Flanguage%2Fref%2Fcplr155.htm
First I refereed to Google C++ coding convention. They say it is not good to use C++ exception with legacy codes which don't use it.
One of reasons is that it causes resource leaks. see the following code.
Here, h() calls g(), and g() calls f(). The problems is when f() throws an exception, if g() doesn't prepare this situation by RAII or others, a memory is leaked in g().
g() has to be very careful that it cleans up every resources it created in a stack unwinding.
void f() throw(myexception)
{
cout << "f() called" <<>
throw myex;
};
void g()
{
int* leak = new int[10];
cout << "g() called" <<>
f();
delete[] leak;
}
void h()
{
cout << "h() called" <<>
try {
g();
}
catch (exception& e)
{
cout <<>
}
}
valgrind --tool=memcheck --leak-check=yes ./a.out
==11647==
==11647== HEAP SUMMARY:
==11647== in use at exit: 40 bytes in 1 blocks
==11647== total heap usage: 2 allocs, 1 frees, 176 bytes allocated
==11647==
==11647== 40 bytes in 1 blocks are definitely lost in loss record 1 of 1
==11647== at 0x4C27939: operator new[](unsigned long) (vg_replace_malloc.c:305)
==11647== by 0x400DAF: g() (in /home/nhn/workspace/study/c++/exception/a.out)
==11647== by 0x400E15: h() (in /home/nhn/workspace/study/c++/exception/a.out)
==11647== by 0x400E93: main (in /home/nhn/workspace/study/c++/exception/a.out)
==11647==
==11647== LEAK SUMMARY:
==11647== definitely lost: 40 bytes in 1 blocks
==11647== indirectly lost: 0 bytes in 0 blocks
==11647== possibly lost: 0 bytes in 0 blocks
==11647== still reachable: 0 bytes in 0 blocks
==11647== suppressed: 0 bytes in 0 blocks
When you use RAII patterns, there will be no memory leak.
class MyRAII
{
public:
MyRAII()
{
cout << "MyRAII() called" <<> m_data = new int[10];
}
~MyRAII()
{
cout << "~MyRAII() called" <<>
delete[] m_data;
}
private:
int* m_data;
};
void g()
{
int* leak = new int[10];
MyRAII myRaii; // no leak
cout << "g() called" << endl
f();
delete[] leak;
}
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml?showone=Exceptions#Exceptions
http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Flanguage%2Fref%2Fcplr155.htm
Wednesday, September 1, 2010
Hadoop record boundary
I am working on build a infrastructure for analysis of a vast amount of emails.
I made a chunk of email archives, size of which is 2G each.
The problem was that when I concatenated each email into a chunk, there had to be a sync mark to distinct the boundaries between each email.
So I put sync marks between emails.
However, when I put this files on the DFS, and run a mapred job, I am just worried about the record boundaries of overlapped record between input splits and have some curiosity about how HADOOP deals with it.
I was able to find the correct answer from this link.
http://wiki.apache.org/hadoop/FAQ#A23
I made a chunk of email archives, size of which is 2G each.
The problem was that when I concatenated each email into a chunk, there had to be a sync mark to distinct the boundaries between each email.
So I put sync marks between emails.
However, when I put this files on the DFS, and run a mapred job, I am just worried about the record boundaries of overlapped record between input splits and have some curiosity about how HADOOP deals with it.
I was able to find the correct answer from this link.
http://wiki.apache.org/hadoop/FAQ#A23
Friday, April 9, 2010
nginx php-fpm TIME_WAIT
When you use nginx as a fastcgi web server and php-fpm for workers, you might come arocss TCP TIME_WAIT problem.
TIME_WAIT is a state of socket waiting for a certain amount of time to close.
Because of the very frequent communications between nginx and php-fpm, you will find sockets of TIME_WAIT accumulating a lot over 30,000. It looks uncomfortable.
The solution is to use a unix domain socket.
1. php-fpm.con
/tmp/fastcgi.sock
2. nginx.conf
fastcgi_pass unix:/tmp/fastcgi.sock;
you will find a great reduction in the number of sockets of TIME_WAIT.
However, from my test the performance is worse than TCP socket.
I don't know why...
Anyway, the result from my benchmark test, nginx was able to process 4~5 times more requests(10,000 units per sec) as Apache did. it's awesome.
TIME_WAIT is a state of socket waiting for a certain amount of time to close.
Because of the very frequent communications between nginx and php-fpm, you will find sockets of TIME_WAIT accumulating a lot over 30,000. It looks uncomfortable.
The solution is to use a unix domain socket.
1. php-fpm.con
2. nginx.conf
fastcgi_pass unix:/tmp/fastcgi.sock;
you will find a great reduction in the number of sockets of TIME_WAIT.
However, from my test the performance is worse than TCP socket.
I don't know why...
Anyway, the result from my benchmark test, nginx was able to process 4~5 times more requests(10,000 units per sec) as Apache did. it's awesome.
Thursday, April 1, 2010
apache mod_rewrite with percent signs
This article shows a little bit how to use mod_rewrite.
The scenario is
1. A client connects to the web server with a '%'-signed URL.
For example, when you use MBSC(e.g. Korean), the URL turns into other form.
http://test.com/테스트 => http://test.com/%C5%D7%BD%BA%C6%AE
The Solution is the followings. But I don't know the internals.
httpd.conf
RewriteEngine on
RewriteRule /테스트 http://test.com [R]
RewriteLog "rewrite.log"
RewriteLogLevel 10
rewrite.log
localhost - - [01/Apr/2010:14:38:07 +0900] [test.com/sid#94f2490][rid#95481a0/initial] (2) init rewrite engine with requested uri /테스트
localhost - - [01/Apr/2010:14:38:07 +0900] [test.com/sid#94f2490][rid#95481a0/initial] (3) applying pattern '/%C5%D7%BD%BA%C6%AE' to uri '/테스트'
localhost - - [01/Apr/2010:14:38:07 +0900] [test.com/sid#94f2490][rid#95481a0/initial] (1) pass through /테스트
localhost - - [01/Apr/2010:14:39:46 +0900] [test.com/sid#8785490][rid#87db118/initial] (2) init rewrite engine with requested uri /테스트
localhost - - [01/Apr/2010:14:39:46 +0900] [test.com/sid#8785490][rid#87db118/initial] (3) applying pattern '/테스트' to uri '/테스트'
The scenario is
1. A client connects to the web server with a '%'-signed URL.
For example, when you use MBSC(e.g. Korean), the URL turns into other form.
http://test.com/테스트 => http://test.com/%C5%D7%BD%BA%C6%AE
The Solution is the followings. But I don't know the internals.
httpd.conf
RewriteEngine on
RewriteRule /테스트 http://test.com [R]
RewriteLog "rewrite.log"
RewriteLogLevel 10
rewrite.log
localhost - - [01/Apr/2010:14:38:07 +0900] [test.com/sid#94f2490][rid#95481a0/initial] (2) init rewrite engine with requested uri /테스트
localhost - - [01/Apr/2010:14:38:07 +0900] [test.com/sid#94f2490][rid#95481a0/initial] (3) applying pattern '/%C5%D7%BD%BA%C6%AE' to uri '/테스트'
localhost - - [01/Apr/2010:14:38:07 +0900] [test.com/sid#94f2490][rid#95481a0/initial] (1) pass through /테스트
localhost - - [01/Apr/2010:14:39:46 +0900] [test.com/sid#8785490][rid#87db118/initial] (2) init rewrite engine with requested uri /테스트
localhost - - [01/Apr/2010:14:39:46 +0900] [test.com/sid#8785490][rid#87db118/initial] (3) applying pattern '/테스트' to uri '/테스트'
Tuesday, January 12, 2010
C++ new operator and object cloning.
STL provides a few operator overloadings for the global new operator, which is used by new keyword to allocate a memory block to make a instance. New operators only provide a service for allocation of memory.
These come with header file new.
When you declare it, it will apply the global namespace.
http://www.cplusplus.com/reference/std/new/
You can allocate a memory block for a uninitialized object(, which means its constructor has yet been called) and initialize with its constructor later.
For instance, using this feature, you can do a cloning for a object.
--------------------------------------------------------------------------------
#include iostream
#include stdlib
#include new
using namespace std;
struct myclass {
myclass(int i):m_i(i) {cout << "myclass constructed\n";}
myclass(myclass &src) {
this->m_i = src.m_i;
}
void print() { cout << "my member is " <<>m_i << endl; }
int m_i;
};
int main() {
myclass tmp(1);
myclass *p3 = (myclass *)malloc(sizeof(myclass));
new(p3) myclass(tmp); // call constructor
//operator new (sizeof(myclass), p3);
p3->print();
return 0;
}
These come with
When you declare it, it will apply the global namespace.
http://www.cplusplus.com/reference/std/new/
You can allocate a memory block for a uninitialized object(, which means its constructor has yet been called) and initialize with its constructor later.
For instance, using this feature, you can do a cloning for a object.
--------------------------------------------------------------------------------
#include iostream
#include stdlib
#include new
using namespace std;
struct myclass {
myclass(int i):m_i(i) {cout << "myclass constructed\n";}
myclass(myclass &src) {
this->m_i = src.m_i;
}
void print() { cout << "my member is " <<>m_i << endl; }
int m_i;
};
int main() {
myclass tmp(1);
myclass *p3 = (myclass *)malloc(sizeof(myclass));
new(p3) myclass(tmp); // call constructor
//operator new (sizeof(myclass), p3);
p3->print();
return 0;
}
Thursday, January 7, 2010
RETE algorithm.
I'm woking on developing a rule-based inference engine nowdays.
I came across an algorithm which can do really fast pattern matching of facts against rules, called RETE.
When you have facts which are augmented into the existing ones, or facts are continuosly chaning, RETE is the solution.
The key idea of RETE is cache memories of nodes. When a node has caclulated a input once, it saves the result of it into its memeory. And when additional facts are known, only compute on newly added facts and join with the existing results in the memories.
Also separation between alpha network and beta network is an amazing idea.
There are a few implemention of this algorithm. Jess in java, Clips in C. Both are open source.
Check the following link out.
http://en.wikipedia.org/wiki/Rete_algorithm
I came across an algorithm which can do really fast pattern matching of facts against rules, called RETE.
When you have facts which are augmented into the existing ones, or facts are continuosly chaning, RETE is the solution.
The key idea of RETE is cache memories of nodes. When a node has caclulated a input once, it saves the result of it into its memeory. And when additional facts are known, only compute on newly added facts and join with the existing results in the memories.
Also separation between alpha network and beta network is an amazing idea.
There are a few implemention of this algorithm. Jess in java, Clips in C. Both are open source.
Check the following link out.
http://en.wikipedia.org/wiki/Rete_algorithm
Subscribe to:
Posts (Atom)