Note: The opinions expressed here are my own and have no relationship with the opinions or official viewpoints of any organization with which I am associated
This particular code has been obsolete for decades. If SCO is still using it in their product, it suggests that there is something seriously out of date in their “enterprise level UNIX”.
Assisted by his vice president, Chris Sontag, McBride showed examples from the code of Linux 2.5 and 2.6 which should prove that source code has been taken out of Unix without change–an example shown by SCO shows code commentaries ... Identical typing mistakes in the commentaries and unusual formulations had left traitorous traces, claimed Sontag. To prove this, McBride had hired a pattern recognition team to hunt through tens of thousands lines of code. The few code sequences near the comments were made illegible to protect SCO's copyrights.There's not much information here, and on my first attempt I came to incorrect conclusions. Firstly, the claims are stupid. There's no code in common, just a comment which, admittedly, looks to be the same. I also don't see any “typing mistakes”. But where does it come from? On the SCO side, it includes another line (“The swap map unit is 512 bytes”). Maybe this is not correct for Linux. But people who copy comments so literally don't remove things just because they're wrong; they haven't fixed the broken indentation, for example, assuming that this is really broken indentation in the code, and not a badly prepared slide; there's every possibility that it's the latter.
But if two comments are the same except for an addition, which is the original? The one without the addition, obviously. I initially saw this as an indication that the code was copied from Linux to UnixWare.
In addition, the alleged code sequences near the comments which were made illegible to protect SCO's copyrights are really additional commentary. You don't have to be a C programmer to recognize that comments start with /* and end with */, and that people frequently put a single * in multiline comments for stylistic reasons, something that the person who put together this slide obviously didn't consider important.
The comment is in English written in approximate Greek letters and reads:
As part of the kernel evolution toward modular naming, the functions malloc and mfree are being renamed to rmalloc and rmfree. Compatibility will be maintained by the following assembler code: (also see mfree/rmfree below)This comment is completely irrelevant to the Linux code to which the first half of the comment (the part written in Roman letters) has been applied. The Linux version of both of these examples comes from the file arch/ia64/sn/io/ate_utils.c. There are a number of interesting things to note about this file:
/* $Id: ate_utils.c,v 1.1 2002/02/28 17:31:25 marcelo Exp $ * * This file is subject to the terms and conditions of the GNU General Public * License. See the file "COPYING" in the main directory of this archive * for more details. * * Copyright (C) 1992 - 1997, 2000-2002 Silicon Graphics, Inc. All rights reserved. */This is not new code, but the RCS identifier (the string starting with $Id$ in the second line) shows that it was incorporated by marcelo on 28 February 2002. marcelo is Marcelo W. Tosatti. This is not the date the code was written, but the date when it was last checked in to the version control system.
Further down in the Heise report, you can read:
In total, SCO's testers claim to have found more than 800,000 lines of duplicate code–an example from SCOOK, let's look at this example. In fact, it's a continuation of the previous example, the function atealloc in arch/ia64/sn/io/ate_utils.c. There are a number of things to note about it:
In fact, it seems that mutex_spinlock() is not a base Linux primitive: it's a macro introduced by SGI for their port only. Thus this code would not even compile on any other version of Linux. It references Linux locking primitives, however, and not System V locking primitives.
My books about UnixWare SMP locking suggest that in this kind of situation, the lock call would be LOCK, not mutex_spinlock. It returns a value of type pl_t, whereas mutex_spinlock returns a value of type int, and it takes two parameters. This information has been available for years in Vahalia, UNIX Internals: The New Frontiers (Prentice-Hall, 1996), page 213. If SCO now has functions like mutex_spinlock in the kernel, it would appear that they have thrown out their own SMP implementation and incorporated the Linux version.
This point is crucial to the reason that I initially came to the wrong conclusion. As it stands, the code is not System V code. In fact, as we'll see below, it is derived from System V in exactly the way I describe.
After reading other opinions, notably those of Bruce Perens and friends (also since updated), I realized that I was wrong: the algorithm for the function atealloc is effectively the old UNIX algorithm for malloc(). SCO is incorrect in claiming that the code in question has been lifted from System V.4 without changes, but that doesn't change the fact that it obviously comes from System V.4. Here's the corresponding code in the Seventh Edition of UNIX (1978), which SCO (then called Caldera) released in early 2002:
/*
* Allocate 'size' units from the given
* map. Return the base of the allocated
* space.
* In a map, the addresses are increasing and the
* list is terminated by a 0 size.
* The core map unit is 64 bytes; the swap map unit
* is 512 bytes.
* Algorithm is first-fit.
*/
malloc(mp, size)
struct map *mp;
{
register unsigned int a;
register struct map *bp;
for(bp=mp;bp->m_size && ((bp-mp) < MAPSIZ);bp++) {
if (bp->m_size >= size) {
a = bp->m_addr;
bp->m_addr += size;
if ((bp->m_size -= size) == 0) {
do {
bp++;
(bp-1)->m_addr = bp->m_addr;
} while ((bp-1)->m_size = bp->m_size);
}
return(a);
}
}
return(0);
}
Both comments and codes are obviously related. But some things are missing, and the comments
are formatted differently. In fact, it is almost identical with the oldest version of this
code, which was introduced in the Third Edition of Research UNIX in January 1973, the first
version of UNIX to be written in C. I've confirmed with a “reliable source” that
System V code includes the following changes:
/*
* Allocate 'size' units from the given map.
* Return the base of the allocated space.
* In a map, the addresses are increasing and the
* list is terminated by a 0 size.
* The swap map unit is 512 bytes.
* Algorithm is first-fit.
*/
This is now the same as the format in the slide, with the exception of broken line wrapping
in the slide.
In addition, System V.3 contains a check which looks surprisingly similar to the code in atealloc:
ASSERT(size >= 0);
if (size == 0)
return((ulong_t) NULL);
The V.3 malloc() does the checks in the other sequence, so the ASSERT
uses the comparison > and not >=. The version in atealloc
is a tiny improvement.
There are further changes in System V.4. First, the function changes its name to rmalloc() and includes the “Greek” comments, which make sense in this context:
* As part of the kernel evolution toward modular naming, the
* functions malloc and mfree are being renamed to rmalloc and rmfree.
* Compatibility will be maintained by the following assembler code:
* (also see mfree/rmfree below)
The other change is the mutual exclusion of other callers. In Research UNIX, malloc() was only called from the process context. The kernel ensured that only one process could call the function at one time. This is apparently no longer the case in System V.4. It looks as if the bottom half can call malloc(), as is also the case in FreeBSD. This requires mutual exclusion, which is done with splimp() in FreeBSD and splhi() in System V.4.
Calling splxxx() functions is pretty mechanical. Define an integer variable,, traditionally with the name s, and bracket the critical region with the following code:
int s;
...
s = splimp(); /* enter critical region */
...
splx(s); /* exit critical region */
In atealloc we see:
register unsigned int s;
...
s = mutex_spinlock(maplock(mp));
...
mutex_spinunlock(maplock(mp), s);
In other words, the function names have changed, but the way they're used has not.
SCO did not point this out in their presentation; this detracts from their credibility.
But maybe this code has come from BSD? No. Even in 1986, in 4.3BSD, malloc() had deviated significantly from the original:
/*
* Allocate 'size' units from the given
* map. Return the base of the allocated space.
* In a map, the addresses are increasing and the
* list is terminated by a 0 size.
*
* Algorithm is first-fit.
*
* This routine knows about the interleaving of the swapmap
* and handles that.
*/
long
rmalloc(mp, size)
register struct map *mp;
long size;
{
register struct mapent *ep = (struct mapent *)(mp+1);
register int addr;
register struct mapent *bp;
swblk_t first, rest;
if (size <= 0 || mp == swapmap && size > dmmax)
panic("rmalloc");
/*
* Search for a piece of the resource map which has enough
* free space to accomodate the request.
*/
for (bp = ep; bp->m_size; bp++) {
if (bp->m_size >= size) {
/*
* If allocating from swapmap,
* then have to respect interleaving
* boundaries.
*/
if (mp == swapmap && nswdev > 1 &&
(first = dmmax - bp->m_addr%dmmax) < bp->m_size) {
if (bp->m_size - first < size)
continue;
addr = bp->m_addr + first;
rest = bp->m_size - first - size;
bp->m_size = first;
if (rest)
rmfree(swapmap, rest, addr+size);
return (addr);
}
/*
* Allocate from the map.
* If there is no space left of the piece
* we allocated from, move the rest of
* the pieces to the left.
*/
addr = bp->m_addr;
bp->m_addr += size;
if ((bp->m_size -= size) == 0) {
do {
bp++;
(bp-1)->m_addr = bp->m_addr;
} while ((bp-1)->m_size = bp->m_size);
}
if (mp == swapmap && addr % CLSIZE)
panic("rmalloc swapmap");
return (addr);
}
}
return (0);
}
The origin of this code is still clearly recognizable, but the code has evolved. If we are to
believe SCO, even today, 17 years later, System V malloc(), a critical function, has
not evolved to this extent. In those 17 years, BSD malloc() has been completely
rewritten, while System V malloc() is essentially the same function as in the very
first C language implementation of 1973.
There are a number of things to note about this code:
>Is there any reason to replace this >code? > Yes, it's ugly as hell. As far as I can see, the only user of ate_malloc are a few rmalloc calls. There is one rmalloc_align call, but afaics the function is not implemented.
The main differences in esr's approach are:
The System V and Linux versions really differ from the common ancestor 32V only in that they both contain mutual-exclusion locking, but it is implemented in significantly different ways, using different data structures.Well, of course they'd use different functions and different data structures: they fit into different kernels. He also doesn't mention the almost identical ASSERT statements in System V and Linux, something missing in all the other versions. Mutual exclusion locking is an understandable thing to add, and as I commented, there's almost only one way to do it. But the ASSERTs are debugging tools which tend to get added after some problem shows up too often, and which then don't go away again (one reason for not removing them is that they're usually not enabled, so they don't take up any space in the executable).
I see nothing to question my statements above.
esr also writes:
In retrospect, there was a clue in the Linux code all along that it had been copied from rather old sources: the register declarations. Those do by hand an optimization that modern C compilers do automatically, and most programmers lost the habit of inserting them in new code a good ten years ago. So the honest question is: where was Linux's atemalloc copied from?This is baffling. In his diff, he shows that System V uses the register keyword as well, so I'm not sure what he's aiming at here. It's true that nobody uses this keyword in new code any more, but we're not talking about new code here. It's 30 years old.
esr makes other points:
Given this, there are two pieces of internal evidence that suggest the ancient code. One is that the function is split in two in SVr4 but single in ancient Unix and Linux.This is true, but the split seems unimportant: In System V, the malloc() function is simply a wrapper for rmalloc(). We've already seen the comparison between rmalloc() and ate_alloc(). The function we're talking about here was called rmalloc() in System V, but since it's been renamed anyway, that's not of any significance.
A subtler indication that one change between SVr4 and Linux would remove a cast (in the second ASSERT call). It is quite unlikely that a programmer casually copying code would go to the effort to remove a cast, and a guilty copier wouldn't do it when there are ways to obscure similarities that are both easier and less likely to spawn subtle bugs. This is especially true since a more effective obscuring method would have been to remove the ASSERTs entirely; they are used for debugging rather than being neccessary to operation and could be readily dispensed with.Agreed, the difference can't be accounted for as an attempt to obfuscate the source of the code; that's obvious enough. But removing the cast in the System V case would cause the code to fail: it's asserting that the value is less than 0x80000000. This is the smallest possible 32 bit signed number, so if we're doing a signed comparison, it will always fail. The point that esr has missed here is that this is 64 bit code, where this value has no particular meaning. It's possible that it was necessary to remove the unsigned to avoid a compiler warning, though I can't see why it should cause one, or for some similar reason, possibly including internal code auditing.
Berkeley? Isn't that BSD? Well, sort of, it seems. It grew up around the BSD distributions, but it's not part of them. The license, however, is pure BSD. The code in question is indeed the same. It's quoted like this:
pc += (A == pc->k) ? pc->jt : pc->jf;
continue;
case BPF_JMP|BPF_JSET|BPF_K:
pc += (A & pc->k) ? pc->jt : pc->jf;
continue;
case BPF_JMP|BPF_JGT|BPF_X:
pc += (A > X) ? pc->jt : pc->jf;
continue;
case BPF_JMP|BPF_JGE|BPF_X:
pc += (A >= X) ? pc->jt : pc->jf;
continue;
case BPF_JMP|BPF_JEQ|BPF_X:
pc += (A == X) ? pc->jt : pc->jf;
continue;
case BPF_JMP|BPF_JSET|BPF_X:
pc += (A & X) ? pc->jt : pc->jf;
Any programmer must cringe at the way this has been quoted. It should be pretty clear even to a
non-programmer that this code consists of groups of three lines. The first line describes a
condition, the second specified an action to take, and the third (continue) tells the
program that that's all (and not to continue to the following line). But the people who
prepared the slides chopped off the first line of the first group, and the last line of the
last group.
As Perens points out, this is not System V code. It's freely available for download on the Internet. The example above comes from the file bpf-1.2a1/net/bpf_filter.c. The license at the beginning of this file reads:
/*- * Copyright (c) 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997 * The Regents of the University of California. All rights reserved. * * This code is derived from the Stanford/CMU enet packet filter, * (net/enet.c) distributed as part of 4.3BSD, and code contributed * to Berkeley by Steven McCanne and Van Jacobson both of Lawrence * Berkeley Laboratory. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)bpf.c 7.5 (Berkeley) 7/15/91 */How could SCO miss such a glaring indication that it wasn't their code? I think the answer is simple: it's no longer there. I suggest that SCO has removed the license conditions, in direct contravention of paragraph 1 of the license. This suggests that, far from proving any fault in Linux, it has pointed to SCO abusing the BSD license.
On 3 September 2003, during the AUUG 2003 conference, I participated in a panel discussion with Kieran O'Shaughnessy, the General Manager of SCO Australia, and Con Zymaris, an Australian open source activist. I asked Kieran this question (“How could you miss the BSD license?”), and he replied that this was not supposed to be evidence of real System V code in Linux, just a demonstration of the techniques involved. At first I thought he was just trying to worm his way out of the issue, but it seems to be the party line; I'll chase down other references when I have time. In the meantime, look at slide 15 of the briefing and decide whether you think that this was their intention. I have difficulty getting past the conclusion:
The first example appears to indicate that SCO, far from being an industry leader in UNIX technology, still uses the original, primitive version of malloc(), a central kernel function, a version which everybody else gave up years ago.
The second example says nothing about Linux, since it's obviously not SCO code. It does, however, suggest that SCO is abusing the BSD license.
Presumably SCO thinks these are some of the best examples. If this is the best they have to offer, they don't have a leg to stand on.
Main SCO page SCO affair overview Greg's home page Greg's diary