fyi: space leak & seg fault in both slapd 3.2 & 3.3

Jeff.Hodges@Stanford.EDU
Fri, 17 May 96 09:47:44 -0700

------- Forwarded Message
Subject: space leak & seg fault in both slapd 3.2 & 3.3
To: bug-ldap@umich.edu
cc: ldap-support@umich.edu, ns-dirsvc-impl@lists.stanford.edu
From: Jeff.Hodges@Stanford.EDU
Reply-to: ns-dirsvc-impl@lists.stanford.edu
X-Office: Pine Hall Rm 161; 415-723-2452
Date: Fri, 17 May 96 09:35:25 -0700

I've found that both slapd 3.2 & 3.3 exhibit space leak behavior (process size
grows w/o bound over time) and both will eventually seg fault when driven flat
out by persistent clients.

I was going to pinpoint the seg fault location for you but, unfortunately, our
current versions of gdb are failing to understand the solaris core file
format, so I'm flyin' blind.

Details below. I wanted to notify you folks that we've found this. What
possibilities are there for help in tracking it down and fixing it? We're
working on fixing our debugging tools on this end (goes without saying, sigh),
so at some point I'll be able to find where at least it is failing, but I'm
not sure how soon.

thanks,

Jeff
- ----------------------------------------------------------------------------
What I did to cause the seg fault was write a simple perl script (below) that
takes in a list of names and queries the directory for them. I took the names
from the directory, so there is a hit on each one. Currently I have seven
machines (which aren't doing a lot else) run the script and pound on a given
(test) slapd server. It will chug along for somewhere between 4000 and 10000
connections before it goes belly-up. This test system is...

Sparc 2, 64 MB ram, Solaris 2.4

..with ldap-3.3/3.2 compiled with gcc v2.7.

In terms of basic space leak behavior, it is pretty apparent that the process
size, as noted in the SZ and RSS fields of /usr/ucb/ps output, grows
continually given use. I have seen it up to 29mb on our production (lightly
loaded) server after about a week (and whereupon it died (don't know for sure
if it was a seg fault)).

It's interesting to note that altho the process size grows when slapd is
driven flat-out, it only reaches about 5mb when the seg fault occurs on the
test system, so I'm not sure if the failures on the hard-driven test system
and our lightly-driven (for now) production system are necessarily related.

Also innaresting is that slapd, on the relatively light-weight, test system,
is able to sustain a rate of about 8.5-9.0 "entries sent"/second. And this is
for "heavy-weight" queries where a BIND/UNBIND is done for each query (i.e.
simple execs of ldapsearch for each name).

I'd really like to see if we can't get slapd to be albe to sustain that rate
for weeks on end without failure.

- -----------------------------------------------------------------------------
-
#!/usr/local/bin/perl

# QueryForNames.pl
#
# a quick hack. takes a list of names on stdin, and queries
# the directory via ldapsearch for each name. The filter is "cn=<name>".
# The DN of each entry is printed to stdout, and the query return time
# is noted.

while($name = <STDIN>) {
chop $name;
open(LDAP, "/usr/pubsw/bin/ldapsearch -h directrix 'cn=$name' dn |");
while($result = <LDAP>) {
chop $result;
($result,$rest) = split(/,/,$result); # extract just the CN
$whenItIs = localtime;
print "$whenItIs; $result\n";
}
close(LDAP);
}
- -----------------------------------------------------------------------------
-

This is a time-stamped log of queries for cn=monitor of the test directory
while it was under assualt from seven machines (one being directrix itself).
The log documents the process growth, and also slapd's performance. This
particular test was against slapd 3.3.

directrix:/usr/local/etc[57]> date; uptime; ldapsearch -h directrix -s base -b
"cn=monitor" "objectclass=*"; proc slapd
Thu May 16 12:13:49 PDT 1996
12:13pm up 2 day(s), 18:11, 5 users, load average: 3.27, 1.42, 0.57
cn=monitor
version=slapd 3.3 (Wed May 15 15:26:42 PDT 1996)
threads=7
connection=5 : 960516121351Z : 2 : 2 : NULLDN :
connection=11 : 960516121351Z : 2 : 2 : NULLDN :
connection=12 : 960516121351Z : 2 : 2 : NULLDN :
connection=13 : 960516121351Z : 2 : 2 : NULLDN :
connection=14 : 960516121351Z : 2 : 2 : NULLDN :
connection=15 : 960516121351Z : 0 : 0 : NULLDN :
connection=17 : 960516121351Z : 1 : 1 : NULLDN :
currentconnections=7
totalconnections=1621
dtablesize=64
writewaiters=0
readwaiters=0
opsinitiated=4855
opscompleted=4854
entriessent=1612
bytessent=157057
currenttime=960516121352Z
starttime=960516114944Z
nbackends=1
concurrency=4
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
root 27040 54.2 4.5 3384 2844 pts/1 R 11:49:43 1:57 ./slapd -d 256 -f
slapd.directrix.test.conf
hodges 27356 0.2 0.8 704 464 pts/3 S 12:13:52 0:00 egrep -i pid|slapd
hodges 27045 0.0 0.7 708 444 pts/5 S 11:50:30 0:00 tail -f slapd.log
directrix:/usr/local/etc[58]> date; uptime; ldapsearch -h directrix -s base -b
"cn=monitor" "objectclass=*"; proc slapd
Thu May 16 12:15:18 PDT 1996
12:15pm up 2 day(s), 18:13, 5 users, load average: 3.66, 2.03, 0.88
cn=monitor
version=slapd 3.3 (Wed May 15 15:26:42 PDT 1996)
threads=0
connection=5 : 960516121518Z : 2 : 2 : NULLDN :
connection=11 : 960516121519Z : 1 : 1 : NULLDN :
connection=12 : 960516121518Z : 2 : 1 : NULLDN :
connection=13 : 960516121518Z : 2 : 2 : NULLDN :
connection=14 : 960516121519Z : 0 : 0 : NULLDN :
currentconnections=5
totalconnections=2383
dtablesize=64
writewaiters=0
readwaiters=0
opsinitiated=7141
opscompleted=7140
entriessent=2374
bytessent=231973
currenttime=960516121519Z
starttime=960516114944Z
nbackends=1
concurrency=5
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
root 27040 53.1 5.0 3660 3120 pts/1 R 11:49:43 2:48 ./slapd -d 256 -f
slapd.directrix.test.conf
hodges 27482 0.2 0.8 704 464 pts/3 S 12:15:19 0:00 egrep -i pid|slapd
hodges 27045 0.0 0.7 708 444 pts/5 S 11:50:30 0:00 tail -f slapd.log
directrix:/usr/local/etc[59]> date; uptime; ldapsearch -h directrix -s base -b
"cn=monitor" "objectclass=*"; proc slapd
Thu May 16 12:17:59 PDT 1996
12:17pm up 2 day(s), 18:16, 5 users, load average: 3.35, 2.59, 1.29
cn=monitor
version=slapd 3.3 (Wed May 15 15:26:42 PDT 1996)
threads=3
connection=5 : 960516121801Z : 2 : 1 : NULLDN :
connection=12 : 960516121801Z : 2 : 1 : NULLDN :
connection=15 : 960516121801Z : 2 : 1 : NULLDN :
connection=16 : 960516121802Z : 1 : 0 : NULLDN :
currentconnections=4
totalconnections=3843
dtablesize=64
writewaiters=0
readwaiters=0
opsinitiated=11524
opscompleted=11520
entriessent=3828
bytessent=373757
currenttime=960516121802Z
starttime=960516114944Z
nbackends=1
concurrency=5
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
root 27040 54.4 5.7 4124 3584 pts/1 R 11:49:43 4:27 ./slapd -d 256 -f
slapd.directrix.test.conf
hodges 27722 0.2 0.8 704 472 pts/3 S 12:18:02 0:00 egrep -i pid|slapd
hodges 27045 0.0 0.7 708 444 pts/5 S 11:50:30 0:00 tail -f slapd.log
directrix:/usr/local/etc[60]> date; uptime; ldapsearch -h directrix -s base -b
"cn=monitor" "objectclass=*"; proc slapd
Thu May 16 12:20:02 PDT 1996
12:20pm up 2 day(s), 18:18, 5 users, load average: 3.15, 2.81, 1.55
cn=monitor
version=slapd 3.3 (Wed May 15 15:26:42 PDT 1996)
threads=6
connection=5 : 960516122003Z : 2 : 0 : NULLDN :
connection=11 : 960516122004Z : 1 : 0 : NULLDN :
connection=12 : 960516122003Z : 3 : 2 : NULLDN :
connection=13 : 960516122003Z : 3 : 2 : NULLDN :
connection=14 : 960516122003Z : 2 : 1 : NULLDN :
connection=15 : 960516122004Z : 0 : 0 : NULLDN :
currentconnections=6
totalconnections=4926
dtablesize=64
writewaiters=0
readwaiters=0
opsinitiated=14771
opscompleted=14765
entriessent=4910
bytessent=479130
currenttime=960516122004Z
starttime=960516114944Z
nbackends=1
concurrency=5
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
root 27040 56.1 6.3 4508 3968 pts/1 R 11:49:43 5:43 ./slapd -d 256 -f
slapd.directrix.test.conf
hodges 27899 0.2 0.8 704 464 pts/3 S 12:20:04 0:00 egrep -i pid|slapd
hodges 27045 0.0 0.7 708 444 pts/5 S 11:50:30 0:00 tail -f slapd.log
directrix:/usr/local/etc[61]> date; uptime; ldapsearch -h directrix -s base -b
"cn=monitor" "objectclass=*"; proc slapd
Thu May 16 12:21:20 PDT 1996
12:21pm up 2 day(s), 18:19, 5 users, load average: 3.45, 2.98, 1.71
cn=monitor
version=slapd 3.3 (Wed May 15 15:26:42 PDT 1996)
threads=0
connection=5 : 960516122122Z : 1 : 1 : NULLDN :
connection=11 : 960516122122Z : 2 : 1 : NULLDN :
connection=12 : 960516122123Z : 0 : 1 : NULLDN :
connection=13 : 960516122122Z : 2 : 2 : NULLDN :
connection=14 : 960516122122Z : 2 : 2 : NULLDN :
currentconnections=5
totalconnections=5631
dtablesize=64
writewaiters=0
readwaiters=0
opsinitiated=16885
opscompleted=16884
entriessent=5615
bytessent=548043
currenttime=960516122123Z
starttime=960516114944Z
nbackends=1
concurrency=5
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
root 27040 54.4 6.5 4648 4104 pts/1 R 11:49:43 6:27 ./slapd -d 256 -f
slapd.directrix.test.conf
hodges 28001 0.2 0.8 704 472 pts/3 S 12:21:23 0:00 egrep -i pid|slapd
hodges 27045 0.0 0.7 708 444 pts/5 S 11:50:30 0:00 tail -f slapd.log
directrix:/usr/local/etc[62]>
directrix:/usr/local/etc[62]>
directrix:/usr/local/etc[62]> date; uptime; ldapsearch -h directrix -s base -b
"cn=monitor" "objectclass=*"; proc slapd
Thu May 16 12:23:01 PDT 1996
12:23pm up 2 day(s), 18:21, 5 users, load average: 3.45, 3.11, 1.90
cn=monitor
version=slapd 3.3 (Wed May 15 15:26:42 PDT 1996)
threads=0
connection=5 : 960516122302Z : 2 : 2 : NULLDN :
connection=11 : 960516122302Z : 2 : 1 : NULLDN :
connection=12 : 960516122302Z : 0 : 0 : NULLDN :
connection=13 : 960516122302Z : 0 : 0 : NULLDN :
currentconnections=4
totalconnections=6507
dtablesize=64
writewaiters=0
readwaiters=0
opsinitiated=19513
opscompleted=19512
entriessent=6484
bytessent=632902
currenttime=960516122302Z
starttime=960516114944Z
nbackends=1
concurrency=5
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
root 27040 55.4 6.9 4904 4360 pts/1 R 11:49:43 7:26 ./slapd -d 256 -f
slapd.directrix.test.conf
hodges 28143 0.2 0.8 704 464 pts/3 S 12:23:02 0:00 egrep -i pid|slapd
hodges 27045 0.0 0.7 708 444 pts/5 S 11:50:30 0:00 tail -f slapd.log
directrix:/usr/local/etc[63]> date; uptime; ldapsearch -h directrix -s base -b
"cn=monitor" "objectclass=*"; proc slapd
Thu May 16 12:24:51 PDT 1996
12:24pm up 2 day(s), 18:23, 5 users, load average: 3.36, 3.20, 2.08
cn=monitor
version=slapd 3.3 (Wed May 15 15:26:42 PDT 1996)
threads=0
connection=5 : 960516122455Z : 0 : 0 : NULLDN :
connection=11 : 960516122455Z : 2 : 5 : NULLDN :
connection=12 : 960516122455Z : 2 : 2 : NULLDN :
connection=13 : 960516122455Z : 2 : 1 : NULLDN :
connection=14 : 960516122455Z : 0 : 0 : NULLDN :
currentconnections=5
totalconnections=7524
dtablesize=64
writewaiters=0
readwaiters=0
opsinitiated=22563
opscompleted=22562
entriessent=7499
bytessent=731966
currenttime=960516122455Z
starttime=960516114944Z
nbackends=1
concurrency=5
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
root 27040 57.8 7.4 5224 4684 pts/1 R 11:49:43 8:37 ./slapd -d 256 -f
slapd.directrix.test.conf
hodges 28301 0.2 0.8 704 472 pts/3 S 12:24:56 0:00 egrep -i pid|slapd
hodges 27045 0.1 0.7 708 444 pts/5 S 11:50:30 0:00 tail -f slapd.log
directrix:/usr/local/etc[64]> date; uptime; ldapsearch -h directrix -s base -b
"cn=monitor" "objectclass=*"; proc slapd
Thu May 16 12:26:46 PDT 1996
12:26pm up 2 day(s), 18:24, 5 users, load average: 3.23, 3.26, 2.23
cn=monitor
version=slapd 3.3 (Wed May 15 15:26:42 PDT 1996)
threads=0
connection=5 : 960516122648Z : 0 : 0 : NULLDN :
connection=11 : 960516122647Z : 2 : 2 : NULLDN :
connection=12 : 960516122648Z : 2 : 2 : NULLDN :
connection=13 : 960516122648Z : 2 : 1 : NULLDN :
connection=14 : 960516122648Z : 0 : 0 : NULLDN :
connection=15 : 960516122647Z : 2 : 2 : NULLDN :
currentconnections=6
totalconnections=8521
dtablesize=64
writewaiters=0
readwaiters=0
opsinitiated=25553
opscompleted=25552
entriessent=8490
bytessent=828887
currenttime=960516122648Z
starttime=960516114944Z
nbackends=1
concurrency=5
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
root 27040 54.7 7.8 5472 4932 pts/1 R 11:49:43 9:42 ./slapd -d 256 -f
slapd.directrix.test.conf
hodges 28463 0.2 0.8 704 472 pts/3 S 12:26:48 0:00 egrep -i pid|slapd
hodges 27045 0.0 0.7 708 444 pts/5 S 11:50:30 0:00 tail -f slapd.log
directrix:/usr/local/etc[65]> date; uptime; ldapsearch -h directrix -s base -b
"cn=monitor" "objectclass=*"; proc slapd
Thu May 16 12:28:51 PDT 1996
12:28pm up 2 day(s), 18:26, 5 users, load average: 3.10, 3.23, 2.36
cn=monitor
version=slapd 3.3 (Wed May 15 15:26:42 PDT 1996)
threads=0
connection=5 : 960516122853Z : 2 : 2 : NULLDN :
connection=11 : 960516122853Z : 2 : 1 : NULLDN :
connection=12 : 960516122853Z : 0 : 0 : NULLDN :
connection=13 : 960516122853Z : 0 : 0 : NULLDN :
connection=14 : 960516122853Z : 1 : 1 : NULLDN :
currentconnections=5
totalconnections=9639
dtablesize=64
writewaiters=0
readwaiters=0
opsinitiated=28907
opscompleted=28906
entriessent=9595
bytessent=939372
currenttime=960516122853Z
starttime=960516114944Z
nbackends=1
concurrency=5
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
root 27040 55.5 8.2 5748 5196 pts/1 R 11:49:43 10:57 ./slapd -d 256 -f
slapd.directrix.test.conf
hodges 28642 0.2 0.8 704 472 pts/3 S 12:28:53 0:00 egrep -i pid|slapd
hodges 27045 0.0 0.7 708 444 pts/5 S 11:50:30 0:00 tail -f slapd.log
directrix:/usr/local/etc[66]>
directrix:/usr/local/etc[66]>

- -----------------------------------------------------------------------------
-

==========================================================================
This message was posted through the Stanford campus mailing list
server. If you wish to unsubscribe from this mailing list, send the
message body of "unsubscribe ns-dirsvc-impl" to majordomo@lists.stanford.edu

------- End of Forwarded Message