| |
Log in / Subscribe / Register

Resetting PHP 6

We're bad at marketing

We can admit it, marketing is not our strong suit. Our strength is writing the kind of articles that developers, administrators, and free-software supporters depend on to know what is going on in the Linux world. Please subscribe today to help us keep doing that, and so we don’t have to get good at marketing.

By Jonathan Corbet
March 24, 2010
Rightly or wrongly, many in our community see Perl 6 as the definitive example of vaporware. But what about PHP 6? This release was first discussed by the PHP core developers back in 2005. There have been books on the shelves purporting to cover PHP 6 since at least 2008. But, in March 2010, the PHP 6 release is not out - in fact, it is not even close to out. Recent events suggest that PHP 6 will not be released before 2011 - if, indeed, it is released at all.

PHP 6 was, as befits a major release, meant to bring some serious changes to the language. To begin with, the safe_mode feature which is the whipping boy for PHP security - or the lack thereof - will be consigned to an unloved oblivion; the "register_globals" feature will be gone as well. The proposed traits feature would bring "horizontal reuse" to the language; think of traits as a PHPish answer to multiple inheritance or Java's interfaces. A new 64-bit integer type is planned. PHP was slated to gain a goto keyword (though the plan was to avoid the scary goto name and add target labels to break instead). Some basic static typing features are under consideration. There was even talk of adding namespaces to the language and making function and class names be case-sensitive.

The really big change in PHP 6, though, was the shift to Unicode throughout. Anybody who is running a web site which does not use Unicode is almost certainly wishing that things were otherwise - trust your editor on this one. It is possible to support Unicode to an extent even if the language in use is not aware of Unicode, but it is a painful and error-prone affair; proper Unicode support requires a language which understands Unicode strings. The PHP 6 plan was to support Unicode all the way:

PHP6 will have Unicode support everywhere; in the engine, in extensions, in the API. It's going to be native and complete; no hacks, no external libraries, no language bias. English is just another language, it's not the primary language.

Unicode, however, appears to be the rock upon which the PHP 6 ship ran aground. Despite claims back in 2006 that the development process was "going pretty well," it seems that few people are happy with the state of Unicode support in PHP. Memory usage is high, performance is poor, and broken scripts are common. The project has been struggling for some time to find a solution to this problem.

From your editor's reading of the discussion, the fatal mistake would appear to be the decision to use the two-byte UTF-16 encoding for all strings within PHP. According to PHP creator Rasmus Lerdorf, this decision was made to ease compatibility with the International Components for Unicode (ICU) library:

Well, the obvious original reason is that ICU uses UTF-16 internally and the logic was that we would be going in and out of ICU to do all the various Unicode operations many more times than we would be interfacing with external things like MySQL or files on disk. You generally only read or write a string once from an external source, but you may perform multiple Unicode operations on that same string so avoiding a conversion for each operation seems logical.

But a lot of strings simply pass through PHP programs; in the end, the conversion turned out to be more expensive and less convenient than had been hoped. Johannes Schlüter describes the problem this way:

By using UTF-16 as default encoding we'd have to convert the script code and all data passed from or to the script (request data, database results, output, ...) from another encoding, usually UTF-8, to UTF-16 or back. The need for conversion doesn't only require CPU time and more memory (a UTF-16 string takes double memory of a UTF-8 string in many cases) but makes the implementation rather complex as we always have to figure out which encoding was the right one for a given situation. From the userspace point of view the implementation brought some backwards compatibility breaks which would require manual review of the code.

These all are pains for a very small gain for many users where many would be happy about a tighter integration of some mbstring-like functionality. This all led to a situation for many contributors not willing to use "trunk" as their main development tree but either develop using the stable 5.2/5.3 trees or refuse to do development at all.

The end result of all this is that PHP 6 development eventually stalled. The Unicode problems made a release impossible while blocking other features from showing up in any PHP release at all. Eventually some work was backported to 5.3, but that is always a problematic solution; it brings back memories of the 2.5 kernel development series.

Developer frustration, it seems, grew for some time. Last November, Kalle Sommer Nielsen tried to kickstart the process, saying:

I've been thinking for a while what we should do about PHP6 and its future, because right now it seems like there isn't much future in it.

Things came to a head on March 11, when Jani Taskinen, fed up with being unable to push things forward, (1) committed some disruptive changes to the stable 5.3 branch, and (2) created a new PHP_5_4 branch which looked like it was meant to be a new development tree. That is when Rasmus stepped in:

The real decision is not whether to have a version 5.4 or not, it is all about solving the Unicode problem. The current effort has obviously stalled. We need to figure out how to get development back on track in a way that people can get on board. We knew the Unicode effort was hugely ambitious the way we approached it. There are other ways.

So I think Lukas and others are right, let's move the PHP 6 trunk to a branch since we are still going to need a bunch of code from it and move development to trunk and start exploring lighter and more approachable ways to attack Unicode.

And that is where it stands. The whole development series which was meant to be PHP 6 has been pushed aside to a branch, and development is starting anew based on the 5.3 release. Anything of value in the old PHP 6 branch can be cherry-picked from there as need be, but the process of what is going into the next release is beginning from scratch, and one assumes that proposals will be looked at closely. There are no timelines or plans for the next release at this point; as Rasmus explains, that's not what the project needs now:

We don't need timelines right now. What we need is some hacking time and to bring some fun back into PHP development. It hasn't been fun for quite a while. Once we have a body of new interesting stuff, we can start pondering releases...

So timing and features for the next PHP release are completely unknown at this point. Even the name is unknown; Jani's 5.4 branch has been renamed to THE_5_4_THAT_ISNT_5_4. There has been some concern about all of those PHP 6 books out there; it has been suggested that a release which doesn't conform to expectations for PHP 6 should be called something else - PHP7, even. There's little sympathy for the authors and publishers of those books, but those who bought them may merit a little more care. But that will be a discussion for another day. Meanwhile, the PHP hackers are refocusing on getting things done and having some fun too.



Resetting PHP 6

Posted Mar 24, 2010 16:27 UTC (Wed) by amk (subscriber, #19) [Link] (6 responses)

"Unicode: everyone wants it, until they get it." -- Barry Warsaw of the Python developers, written when Python 1.6/2.0's Unicode support was being built.

Resetting PHP 6

Posted Mar 24, 2010 19:29 UTC (Wed) by niner (guest, #26151) [Link] (5 responses)

But Python's Unicode support is painful anyway, so those guys may not be
an adequate source of opinions...

Resetting PHP 6

Posted Mar 24, 2010 22:12 UTC (Wed) by HelloWorld (guest, #56129) [Link] (4 responses)

What's wrong with Python 3's Unicode support?

Resetting PHP 6

Posted Mar 26, 2010 3:51 UTC (Fri) by spitzak (guest, #4593) [Link] (3 responses)

Python 3 is doing the EXACT SAME STUPID MISTAKE. It is going to be a disaster and the developers are too blinded to realize it.

There will be the annoying overhead of converting every bit of data on input and output. But far more important will be the fact that errors in the UTF-8 will either be lost or will cause exceptions to be thrown, producing a whole universe of ugly bugs and DOS attacks. This is going to suck bad!

Strings should be UTF-8 and string[n] should return the n'th byte in the string. That is the TRUTH and Microsoft and Python and PHP and Java and everybody else is WRONG.

But how do I get the N'th character???? You are probably sputtering this nonsense question right now, right? You need to ask yourself: where did "N" come from? I can guarantee you it came from an iterative process that looked at every character between some other point and this new point. The proper interface to look at "characters" is ITERATORS. They can move by one in each direction in O(1) time. And different iterators can return composed or decomposed characters, and if the byte is an error they can clearly return that error and also return suggested replacement values.

Unfortunatly Unicode and UTF and perhaps some kind of politically-correct rule that we can only have equality and world peace if some people don't get the "better" shorter encodings, seems to turn quite intelligent programmers into complete morons. Or more like idiot savants: they are dangerously talented enough to write these horrible things and foist them on everybody.

Resetting PHP 6

Posted Mar 27, 2010 0:52 UTC (Sat) by jra (subscriber, #55261) [Link]

Hear hear. I merged in the original wide character support for Samba, done by the Japanese. Eventually we moved to a utf8-based solution (coded by tridge, naturally :-) with iterators for manipulating the strings. It's the only thing that makes sense.

Jeremy.

Resetting PHP 6

Posted Mar 31, 2010 16:50 UTC (Wed) by anton (subscriber, #25547) [Link] (1 responses)

Strings should be UTF-8 and string[n] should return the n'th byte in the string. That is the TRUTH and Microsoft and Python and PHP and Java and everybody else is WRONG.
I guess Forth does not belong to "everybody else", then, because we are going in the direction you suggest. The ideas are probably best explained in an early paper, but if you want to know where this went, look at the current (frozen) proposal.

Resetting PHP 6

Posted Mar 31, 2010 17:49 UTC (Wed) by spitzak (guest, #4593) [Link]

I strongly agree with Forth's solution. The postscript paper describes exactly how easy it was to use UTF-8 if you stop panicking about "characters" and realize that they are just like words and nobody worries that you can't find the ends of words in O(1) time. The listing of the number of lines changed should be very instructive. I hope everybody saying I am wrong might read the paper.

Forth's solution appears to have an interator return an object that they call an "xchar" which is a Unicode code point. I believe such an object is easily extended to return "UTF-8 encoding error" as a different value. You can also make different iterators to return composed or decomposed characters, and to automatically convert UTF-8 errors to CP1252 equivalents, which (though unsafe) will remove any need to "identify the character encoding" since this will reliably recognize UTF-8, ISO-8859-1, and CP1252 automatically, even if variations are pasted together.

Resetting PHP 6

Posted Mar 24, 2010 16:44 UTC (Wed) by cdamian (subscriber, #1271) [Link] (2 responses)

I am glad about the decision, any decision is better than no movement at all.

I also think choosing UTF-16 was wrong, most of the code, html and databases out in the real world are UTF-8 and choosing anything else is just silly.

I was at the OSCON in 2000 when Perl6 was announced and at that time I was working in London on a large Perl project and I wasn't convinced that it was a good idea. These kind of rewrites and revolutions usually take too much time and destroy your current user base if you are not careful. And while the rewrite happens the whole world keeps on moving. And if you make your users change all there code, they might as well change to a new language or system.

Some other projects trying the impossible:

- Python3 (though not a rewrite, but still slow adoption at the moment)
- typo3 5
- Doctrine 2 (small enough that it might work)
- Symfony 2 (hopefully with some migration path in the future)

Change is good, but too much change not always so.

Resetting PHP 6

Posted Mar 24, 2010 17:40 UTC (Wed) by drag (guest, #31333) [Link] (1 responses)

Adoption of new python releases have always been slow. I'm using Python 2.5 mostly since that is what is used in Debian unstable by default right now. Python2.6 was first released in 2008.

Of course Python2.6 and Python3 are avialable. I just try to write things so as to minimize the effort it takes to port it to newer versions, which for me is acceptable since none of it is really very complex. Larger projects are going to have larger problems, of course.

The big difference between Perl 6, PHP 6, and Python3 is that Python3 is out right now, avialable, has a bunch of transition tools, code is somewhat backwards compatible to 2.6, and it's had a couple stablizing releases.

Also it's not just about the Unicode support... Strings in Python 2.x were heavily overloaded and used for _everything_...(the major alternative being using Array module, which is a wrapper around C arrays, which end up being slower for most things then native python data types) Being forced to use data encoded into ASCII strings for everything has grown quite painful. Especially since every year less and less of your data is actually going to be stored in ASCII strings! It's all UTF-8 or binary. Having _all_ strings be unicode while introducing the byte data type is a godsend for a lot of things I need python for. Keeps things simple, clean and fast.

Backwards compatibility

Posted Mar 27, 2010 22:53 UTC (Sat) by man_ls (guest, #15091) [Link]

The big difference between Perl 6, PHP 6, and Python3 is that Python3 is out right now, avialable, has a bunch of transition tools, code is somewhat backwards compatible to 2.6, and it's had a couple stablizing releases.
But "somewhat backwards compatible" is not good enough. For any non-trivial applications you still need to test everything again, and probably do some coding + testing + deploying. In business settings it translates to money and pains; in volunteer projects just pains.

Even when backwards compatibility is a requirement, like for Java (where the rare breakages are clearly signaled and known by everyone), testing time for new versions has to be allocated. With Python migrations are a showstopper for most people unless the new version somehow provides great advantages (which for me it doesn't). For developers of the language itself and the runtime, the supposed benefits of not having to be backwards compatible are probably offset by having to support two or three versions indefinitely.

just kill it

Posted Mar 24, 2010 17:08 UTC (Wed) by b7j0c (guest, #27559) [Link] (18 responses)

actually "fixing" php would require creating a language syntax and a runtime incompatible with php5, in which case rasmus et al might as well cede the future to a better-designed general purpose scripting language like python or even javascript

as it stands, the php world is already fracturing. facebook, the most prominent user of php, is moving to their own c++ based hiphop toolchain...which effectively means they have forked php and can make language-level changes if they want. i presume the abysmal memory bloat and performance of the stock php runtime have induced this change. i doubt facebook devs even care what rasmus does at this point.

but lets not leave the language syntax out here. php's enthusiastically juvenile syntax is only appropriate for the most novice coders. everyone else with any experience rapidly hits the wall with the language. i don't even want to know how the php team would fix this, their current language syntax decisions indicate they have no business designing languages.

rasmus, its time to admit that php has reached the end of its effective life. put php5 into support mode and encourage the use of better languages with better runtimes.

just kill it

Posted Mar 24, 2010 17:45 UTC (Wed) by clump (subscriber, #27801) [Link] (1 responses)

rasmus, its time to admit that php has reached the end of its effective life. put php5 into support mode and encourage the use of better languages with better runtimes.
Seems a little harsh. Can you point to any examples where what you suggest has happened?

just kill it

Posted Mar 24, 2010 18:31 UTC (Wed) by b7j0c (guest, #27559) [Link]

i'm not sure what you mean by examples. my basic point is that fixing php would mean effectively scrapping it. considering the debacle over basic language decisions like namespaces, its clear that php5 cannot be "patched"

just kill it

Posted Mar 24, 2010 18:48 UTC (Wed) by ikm (guest, #493) [Link] (1 responses)

I think PHP began as a simple preprocessing language. In C/C++, when you have an .h file you want to include into each and every .c file, you use the preprocessor directive "#include". What if you want the same thing for html? Say, have all pages the same header or footer? That's right, put "include" or "require" statement inside of your .html, rename it to have the .php extension - and you're done. That's the actual originating use of PHP, I presume. Of course it continued from then, but I think it never really evolved into an actual programming language -- rather than that, it stated the preprocessing one.

So basically, when you have a lot of html and only need simple structuring (e.g. make them use a single header or footer), you use PHP. If you're doing something much more complex, you'd probably be better with some other language. Therefore I think php have its niche and it isn't going anywhere.

just kill it

Posted Mar 25, 2010 4:25 UTC (Thu) by BenHutchings (subscriber, #37955) [Link]

SSI already provided 'include' functionality. People wanted more features and PHP provided them, sadly without much of an overall design.

just kill it

Posted Mar 24, 2010 19:07 UTC (Wed) by jwarnica (subscriber, #27492) [Link] (1 responses)

There are lots of things wrong with "PHP", but syntax is hardly one of them.

PHP syntax is far more expressive then, say, Java. And far less annoying then the sigi hell of Perl.

It sucks that the core libraries seem to have random parameter ordering. It sucks that there are a lot of brain-dead php apps out there. It sucks that there are a lot of brain-dead PHP coders out there. But syntax? That is the least of PHPs problems.

just kill it

Posted Mar 24, 2010 22:07 UTC (Wed) by elanthis (guest, #6227) [Link]

The PHP syntax sucks, and I say that as someone who has worked with it
professionally since 2000. Saying that some other languages are worse is
irrelevant; by that kind of reasoning, PHP is perfect in every way because
some other language out there has surely done it all even worse.

PHP 5.3 namespace syntax? The fact that it _just_ finally got real
closures? Unknown identifiers are treated as strings? Function calls
can't be used in any general expression (e.g., foo()->bar() does not work,
but $tmp = foo(); $tmp->bar() does) ? No pass-by-name function parameters
(for no logical reason, just "it's not the PHP way") ? Type hinting for
objects and arrays only? Inconsistent special operator names?

Granted, you're right, the syntax is not PHP's biggest problem. The entire
implementation is its biggest problem. The compiler is crap, buggy,
unpredictable, and can't deal with any kind of failure (no way to catch and
gracefully fail on a parse error when including another file, for example).
The language is slow. The C API is hideous. Many of those syntax warts
are all but forced by the internal implementation, which itself grew out of
a lack of abstraction between the crappy original syntax and the runtime
engine.

If I were to do a PHP 6, it would be just a cleanup of the internals, all
deprecated APIs removed, and something very clean and easy to build upon so
that 6.1, 6.2, 6.3, etc. can start delivering at a higher quality.

That is basically what PHP 5 ended up being, and look how much life that
breathed into PHP.

just kill it

Posted Mar 24, 2010 19:19 UTC (Wed) by robla (guest, #424) [Link] (10 responses)

"...their current language syntax decisions indicate they have no business designing languages.

In your world, who should decide who gets to design programming languages?

just kill it

Posted Mar 25, 2010 5:11 UTC (Thu) by b7j0c (guest, #27559) [Link] (9 responses)

anyone else

I guess I'm just not so cavalier....

Posted Mar 25, 2010 5:42 UTC (Thu) by robla (guest, #424) [Link] (8 responses)

...about trash-talking other people's hard work. PHP allowed a lot of
people to start programming who might not have ever gotten slurped into
programming. The apps created by those new developers (e.g. Wordpress,
MediaWiki, Drupal, etc) are applications that are running some of the
highest traffic websites in the world. While those applications could have
been written by "real" programmers in "good" programming languages, the
fact of the matter is that the "real" programmers just didn't have the time
or inclination to write those apps. So, now we have a lot of great
applications that may not be so pretty under the hood, but they often still
do the job better than any other application out there, and may not have
otherwise existed had PHP not been around.

Want to see something kinda funny? Check out http://www.haskell.org/ I
doubt you're going to find too many people out there that would lump
Haskell in with the "bad" languages. In fact, it could very well be
something a lot of us are writing production apps in a decade from now.
However, guess what they're using to host haskell.org...:
http://www.haskell.org/haskellwiki/Special:Version

I guess I'm just not so cavalier....

Posted Mar 25, 2010 14:22 UTC (Thu) by Simetrical (guest, #53439) [Link] (6 responses)

Please don't cite MediaWiki as an example of a great app written in PHP.
Pretty much all of us MediaWiki developers hate the language passionately
and wish we were using something else. (Although, not all of us agree on
what that something else should be.)

PHP is *not* easier to learn than, say, Python. That's just not true IMO.
And it's definitely not true that MediaWiki wouldn't have existed if not for
PHP. phase2 was written in Perl, IIRC, and it was a couple of people's
decision to pick PHP for phase3 -- it would have been written either way.

I guess I'm just not so cavalier....

Posted Mar 25, 2010 18:21 UTC (Thu) by robla (guest, #424) [Link] (5 responses)

Sorry, I'd forgotten about the UseModWiki days, though I'm going to bet that the anti-PHP crowd here doesn't really have a higher opinion of Perl. At any rate, Wordpress and Drupal still qualify, and there's a ton of other really useful software that falls into that category.

The thing that PHP has historically had going for it is mod_php, which was for a very long time way better than mod_perl and mod_python. It had the added benefit of being turned on by default in many contexts (e.g. cheap web hosts). That sort of availability made web programming a lot more accessible to a lot more people. That's not really a triumph of language design so much as interpreter design, but I do find it peculiar that Perl and Python couldn't beat PHP in this area, given the long headstarts they had.

Speaking of interpreter design, I think Python's Global Interpreter Lock is something that bears every bit as much scrutiny as any of PHP's deficiencies. While I'm not interested in starting a PHP vs Python flamewar (I happen to be primarily programming in Python these days), I think this just goes to show you that there's always tradeoffs in picking languages.

I guess I'm just not so cavalier....

Posted Mar 25, 2010 19:58 UTC (Thu) by Simetrical (guest, #53439) [Link] (4 responses)

If PHP hadn't existed, web hosts would be using something else instead. Probably something based on Unix permissions instead of things like open_basedir and max_timeout that try to enforce permissions or resource limits in userspace, thereby prohibiting perfectly sane things like shelling out to other programs.

If web hosts used something else, web apps would be written in something else. It's that simple. Wordpress and Drupal are web apps that happen to be written in PHP, not consequences of PHP's existence. I'd bet that they're written in PHP because that's how you reach the largest audience, because that's what webhosts use.

The Python GIL is a nonissue if you're running single-threaded code. Does PHP support multithreaded execution at *all*?

I guess I'm just not so cavalier....

Posted Mar 25, 2010 21:18 UTC (Thu) by foom (subscriber, #14868) [Link] (1 responses)

> The Python GIL is a nonissue if you're running single-threaded code.

Not really...Python doesn't properly support multiple distinct interpreters within a process -- you
can do it, but they aren't properly isolated from each-other. One important way they aren't isolated:
they all share the same GIL. So, you can't even properly run multiple single-threaded python
interpreters within a multithreaded process. It works, but only one thread can actually run at a time,
across all interpreters.

So of course that means you can't run python (efficiently) within a threaded apache.

I guess I'm just not so cavalier....

Posted Mar 25, 2010 22:31 UTC (Thu) by Simetrical (guest, #53439) [Link]

As far as I know, many/most PHP modules don't work at all with a threaded
Apache, and it's generally advised that mod_php users stick to prefork or
FastCGI. So this isn't a big advantage for PHP.

I guess I'm just not so cavalier....

Posted Mar 25, 2010 23:29 UTC (Thu) by JoeF (guest, #4486) [Link] (1 responses)

Does PHP support multithreaded execution at *all*?

Yes, it does.
But a large part of the third-party modules are not thread-safe, so unless you limit yourself to what you can run (and test the hell out of things), you are better off not running a multithreaded build of Apache.

I guess I'm just not so cavalier....

Posted Mar 26, 2010 13:51 UTC (Fri) by foom (subscriber, #14868) [Link]

I've kinda wondered how exaggerated this problem is. I mean, the default on windows is threaded -
- do most modules blow up by default on windows? That seems like a problem that their authors
would want to fix.

I guess I'm just not so cavalier....

Posted Mar 25, 2010 17:12 UTC (Thu) by b7j0c (guest, #27559) [Link]

you're getting offtopic

is it possible to write a great app in php? yes

have some people done it? yes

do most people who have done it want to consign php to the dustbin of history? YES

just kill it

Posted Mar 25, 2010 14:25 UTC (Thu) by Simetrical (guest, #53439) [Link]

Wikipedia is one of the other biggest users of PHP, and we're probably going
to move to Hiphop when it matures somewhat. Domas Mitzuas, volunteer
performance engineer for Wikipedia for the last several years, also happens
to be a DBA at Facebook. Since MediaWiki is used so widely, though, we'd
have to still be compatible with stock PHP.

(This is just a guess, though, not any official statement -- I have nothing
to do with Wikipedia systems administration, only MediaWiki development.)

Resetting PHP 6

Posted Mar 24, 2010 18:35 UTC (Wed) by ikm (guest, #493) [Link] (3 responses)

UCS-4 (some call it UTF-32) allows random access to individual code points, but this access isn't really always needed, and the waste is great. UTF-16 has none of the advantages of UTF-8, but all of its disadvantages. It seems logical therefore to operate almost solely on UTF-8. For that, the language should have utf8 string iterators, store string's logical length, and so on. Problem is, to make sure no programmers' errors slip through, one should exclude any support for direct 8-bit string manipulations from it. You may not e.g. be able to cut such strings at arbitrary 8-bit boundaries, and shouldn't even know their 8-bit sizes. The string would then actually feel like an UCS-4 string -- only without random access. This feels quite limiting, but I think that would still be the right approach. If an 8-bit string is needed, there should be ways to convert/project -- but the distinction must be stark. If, on the other hand, direct random access to UCS-4 data is required, the string could temporarily convert itself to UCS-4 under the hood, and then later shrink back to UTF-8.

This would look like the right approach to me.

Resetting PHP 6

Posted Mar 24, 2010 19:26 UTC (Wed) by mrshiny (guest, #4266) [Link]

Just want to point out that a useful language will always have ways for programmers to screw up character encodings. In Java a char is distinct from a byte, and yet people do someString.getBytes("UTF-8") (to get the bytes when utf-8 encoded) then proceed to treat each byte as if it represents a letter. Since you can't take away the ability to write character data into an arbitrary encoding, you can't take away this particular failure mode. Character encodings should be taught in school as an abject lesson in the consequences of data storage decisions.

Resetting PHP 6

Posted Mar 26, 2010 4:02 UTC (Fri) by spitzak (guest, #4593) [Link] (1 responses)

You are seriously overestimating the damage of "cutting a string at an arbitrary byte".

First of all, the primary thing that happens in real programs is that the halves of the string get pasted back together, such as when fixed-sized blocks are copied from one file to another. That does not destroy UTF-8 at all.

Second, why is breaking a "character" really such a disaster? Why are we not worried about breaking "words"? If I split a english word in half I will probably get two non-words. How can I possibly safely use a computer language that allows such things? Why it seems hard to believe that word processors could be written when the computer would allow this horrible abilty! /sarcasm

Worrying about "breaking characters" is actually stupid, and is being used as an excuse to defend the bone-headed decision to use "wide characters".

Resetting PHP 6

Posted Mar 26, 2010 9:51 UTC (Fri) by ikm (guest, #493) [Link]

> First of all, the primary thing that happens in real programs is that the halves of the string get pasted back together

No, your example doesn't count -- this isn't string splitting, your resulting strings are intact there. The primary thing that happens in real programs is that they try to shorten the string, e.g. make "A very long string" into something like "A very lo...", to squeeze it in e.g. a fixed space of 12 characters, or do similar transformations. Those transformations can't be done correctly on raw 8-bit utf-8 strings.

> why is breaking a "character" really such a disaster? Why are we not worried about breaking "words"?

Because you're breaking the underlying encoding of the characters, not the characters itself. The resulting bitstream would be an invalid utf-8 sequence. Parts of english words you split would be rendered intact just fine, but damaged and invalid utf-8 would either result in no display at all, or in program/library barf. You can safely combine valid utf-8 sequences together, but you can't arbitrarily cut them and expect the result to be valid.

> Worrying about "breaking characters" is actually stupid, and is being used as an excuse to defend the bone-headed decision to use "wide characters".

As a Russian, I actually know how important this is. I've seen enough non-utf8 aware programs and observed enough of their horrendous problems to understand the importance of wide characters. What makes you so bold in your statements? You seem to know nothing about the topic.

Two for two!!

Posted Mar 24, 2010 19:36 UTC (Wed) by dskoll (subscriber, #1630) [Link]

We develop a commercial piece of software using (primarily) Perl and PHP. It seems we've successfully jinxed both of them! :-)

/me mutters something about "should've stuck to C...."

Resetting PHP 6

Posted Mar 25, 2010 5:54 UTC (Thu) by branden (guest, #7029) [Link] (12 responses)

Unless it can rationalize its BS licensing, PHP should die. I've been boycotting the language since version 4 came out (with the--OOOH!---"Zend Engine") and see no reason to stop.

Resetting PHP 6

Posted Mar 25, 2010 10:51 UTC (Thu) by djzort (guest, #57189) [Link] (11 responses)

so is perl 6 likely be released first?

Resetting PHP 6

Posted Mar 25, 2010 16:58 UTC (Thu) by JoeF (guest, #4486) [Link] (2 responses)

Only after Duke Nukem Forever is released ;-)

Resetting PHP 6

Posted Mar 26, 2010 9:52 UTC (Fri) by ikm (guest, #493) [Link] (1 responses)

It was officially cancelled.

Resetting PHP 6

Posted Mar 27, 2010 12:13 UTC (Sat) by HelloWorld (guest, #56129) [Link]

It wasn't. On
http://www.shacknews.com/onearticle.x/61747
it says:
"we've never said that Duke Nukem Forever has ceased development,"

Resetting PHP 6

Posted Mar 25, 2010 17:15 UTC (Thu) by chromatic (guest, #26207) [Link] (6 responses)

Perl 6 exists and is available today.

Rakudo (one of several Perl 6 implementations) had its 27th release last week. Rakudo also shipped for the first time in Fedora 12.

Resetting PHP 6

Posted Mar 26, 2010 13:20 UTC (Fri) by Darkmere (subscriber, #53695) [Link] (5 responses)

I'll believe it when I can go to perl.org and see "current version" being something other than 5.xx.x , perhaps even perl 6.0.0.

Until then, Perl is at 5.x.

Rakudo is something different, a Perl-like language, perhaps a steppingstone for future Perl technology. But it isn't Perl 6.0 to this member of the audience. It is Rakudo. Not Perl.

Resetting PHP 6

Posted Mar 26, 2010 19:05 UTC (Fri) by chromatic (guest, #26207) [Link] (2 responses)

Aren't you making an ontological argument (Perl 6 doesn't exist, because it hasn't been released, because the text on a website says that Perl 5.10.1 is the current version of Perl) based on a definitional fallacy (you will believe that Rakudo is a Perl 6 implementation when the text on a specific website changes)?

Perl.com didn't mention Perl 5.10.1 for several months. Which has precedence, perl.org or perl.com? Which has precedence with regard to Perl 6, perl.org or perl6.org?

I can understand that you don't want to download or use a Perl 6 implementation such as Rakudo until it meets certain criteria, and I can understand that a big shiny Download Now button is such a criterion for certain classes of users, but I don't understand how an HTML change to add a download button somehow flips the switch from "The software does not exist as its developers claim it does" to "Oh, now it really exists," at least for a project which isn't itself solely a download button.

Resetting PHP 6

Posted Mar 27, 2010 20:06 UTC (Sat) by bronson (subscriber, #4806) [Link]

There's a difference between "available for use as an experiment" and "available for use as Perl." If perl.org doesn't link to Perl6 from its home page, then one would guess that Perl6 isn't available for general use.

And one would be right.

No need to get all insulty with big shiny download buttons.

If I had mod points, I'd give you one.

Posted Apr 15, 2010 10:45 UTC (Thu) by qu1j0t3 (guest, #25786) [Link]

Well said.

Resetting PHP 6

Posted Mar 31, 2010 8:48 UTC (Wed) by roerd (guest, #64880) [Link] (1 responses)

> Rakudo is something different, a Perl-like language, perhaps a steppingstone for future Perl technology. But it isn't Perl 6.0 to this member of the audience. It is Rakudo. Not Perl.

By that definition there will never be a Perl 6.0, because Perl 6 is a specification, not an implementation. Though of course you're right that at this time Rakudo can't be an implementation of Perl 6.0, because the specification is still a moving target.

Resetting PHP 6

Posted Mar 31, 2010 9:51 UTC (Wed) by Darkmere (subscriber, #53695) [Link]

Indeed, and this makes me quite sad. Because really, it feels as if Perl slipped off the map and into la-la-land. Not of the Duke Nukem Forever-style, but by setting the system up to a situation where you cannot deliver Perl6, because it's some immaterial beast that has yet to be able to exist.

Resetting PHP 6

Posted Mar 25, 2010 17:16 UTC (Thu) by b7j0c (guest, #27559) [Link]

well the development version of perl6 can be used right now

python3 and perl6 are both a ways off from full adoption by their own communities, but people are using these tools right now, and the "stable" versions of each tool (python2.x and perl5.x) are also active.

you can't compare php6 to these tools. php6 has essentially lost five years of effort

Resetting PHP 6

Posted Mar 27, 2010 0:37 UTC (Sat) by cmccabe (guest, #60281) [Link]

ln -s /usr/bin/ruby /usr/bin/php6

Problem solved; who's up for lunch?

Resetting PHP 6

Posted Jun 15, 2011 7:16 UTC (Wed) by nivas (guest, #75700) [Link]

Hi, When it will be released?


Copyright © 2010, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds