I note that, like many erstwhile specs, TOML does not document the escape sequences accepted in strings. Nor does it exhaustively specify integer formats and float formats - rather ironic for a spec that advertises "TOML is designed to be unambiguous and as simple as possible."
The limitation on array types seemed fairly arbitrary at first glance, but after thinking it over I realized it aided compatibility with languages that do not support homogeneous arrays. Though as far as the types go, I would add boolean and perhaps non-quoted strings for single-word values.
Now that the technical criticism is out of the way, holy crap this guy is arrogant.
As proper nouns become more common, they first lose any capitalization in the middle of the word, and then finally capitalization of the initial letter. It's human language. It happens.
Yes, and I think it's arrogance on the part of Wordpress (there I did it) folks to insist that everyone capitalize it in the prescribed manner. Especially since they weren't consistent from the get-go. They even went so far as to make Wordpress (trolol) itself filter content to be capitalized if someone tries using the lower case p. http://justintadlock.com/archives/2010/07/08/lowercase-p-dan...
It's to do with protecting their trademark though. That whole human language makes proper nouns normal words - companies don't like that at all. In the case of WordPress, there's a lot of potential for abuse if anybody can call their system it or whatever.
Hehe, that reminds me of iphones auto-correcting "iphone" to "iPhone". Jeez that would irritate me, I'm trying to write a text message, not look like an iDouche...
Urg. Off topic, but I dislike this perl/ruby tendency of calling hash tables hashes. When I see the word hash, I always think of a value (ie a hash code) and not a data structure. Why couldn't they call it a hash map, hash table, map, table, dictionary etc like all the other languages...?
I agree with that. 'map' or 'dictionary' are the best choices I think (or 'associative array', but why bring arrays into it). That's the interface, of which a hash table is just one possible implementation.
I've never liked 'dictionary'. The analogy isn't at all apparent to me. A dictionary explains what words means. The thing we're talking about doesn't explain what keys mean. (Someone who spends most of his time writing python here.) 'map' or 'mapping'.
It's about the operations. One Does Not Simply (tm) read a dictionary. One instead performs a “lookup” for a particular item. The dictionary is designed to make this lookup fast and reliable, which matches the purpose of these data structures in software.
A dictionary maps words to their definitions. The words are the keys, the definitions are the values. Seems reasonable to me. Though as another predominantly pythoner, I do prefer map as well.
In a dictionary the value (meaning) is often (partially) implied by the key (word), by etymology etc. In the data structure there need be no relationship between the key and value other than the fact that they are a key-value pair in this instance. It introduces messy cultural concepts into what should be a clean, abstract concept.
I think dictionary is a useful high-level analogy. Small key objects mapping to potentially large, and often structured, value objects. (By structure, I mean the definition in a dictionary often includes fields like pronunciation and origin.)
Sure, so go the Lua route: table. Or the python route: dictionary. If neither of those do it for you, how about "mapping"?
Hash (and hash map, hash table etc) leak too much implementation detail. What if you want a tree-based mapping instead? I like how in C++ it's map (for ordered, rb-tree based maps) and unordered_map (for unordered, hash table based maps).
In my experience, making obviously bad things difficult or impossible improves reliability. This idea certainly resides within my cranial cavity, but that doesn't necessarily make it wrong.
The obviously bad part is that you pollute the global namespace for no reason other than laziness. When someone comes across code that uses a "Table" object interchangeably with "Dictionary" and "Hash", then he's going to have to look through the source code to find this bizarre line only to find out that you renamed a built-in container for no good reason.
So naturally people talk of Hashes etc. I understand where you're coming from, but it's really not very important, and it would be more confusing to talk of Hash Tables as learners would naturally look for HashTable in the stdlib.
That was the question: why the class is named Hash instead of HashMap or Dictionary? Was it done intentionally, or it is just an accident because someone did not know English very well?
I agree Map or Dictionary would have been fine too, but it's so widely used it needs to be easy to type so two words is not great (HashMap). However I suspect it was just named that way following perl (written by an english speaker). Obviously it's far to late to change it now and I can't say it bothers me or most Ruby users. It's something you get used to very quickly.
Though even HashMap isn't bad because typing is a solved problem - with auto completion and touch typing two words really aren't an issue in my mind.
People get used to living with all kinds of things, but that doesn't make them any better. Yes I'm aware that this applies equally to my typing comment as to you having got used to hash.
Because we need a decent human readable format
that maps to a hash and the YAML spec is like
600 pages long and gives me rage. No, JSON
doesn't count. You know why.
I do not know why, And would love if one can explain me?
Other than comments, I see not difference between both.
Also, that human readable is not an accurate, as it should be hacker readable, you know, IT folks are the only target audience of those files.
[owner]
name = "Tom Preston-Werner"
organization = "GitHub"
bio = "GitHub Cofounder & CEO\nLikes tater tots and beer."
dob = 1979-05-27T07:32:00Z # First class dates? Why not?
{
"owner": {
"name": "Tom Preston-Werner",
"organization": "GitHub",
"bio": "GitHub Cofounder & CEO\nLikes tater tots and beer.",
"dob": "1979-05-27T07:32:00Z"
}
}
In JSON that datetime won't deserialize to a datetime instance in your language in a conforming parser. Further JSON has no comments (this is a killer for a configuration format).
JSON wasn't invented, it was discovered, from a long evolution of programming languages. The punctuation isn't ceremony. It's the amount needed for it to be concise (clear and terse, not just terse).
The difficulty level is hardly extreme. It is not an unreasonable challenge to learn that writing an array of elements requires opening and closing brackets.
The issue here might be that JSON has become widely used for two things:
Data marshalling/transfer
Config formats
For the latter, as they are typically written by hand, it's not particularly appropriate as the syntax is noisy and multiple nesting with brackets tends to lead to errors, even if you understand it perfectly well in principle, and of course there are no comments, no datetimes etc.
I imagine this is intended as a saner version of YAML for configs.
I didn't view bracketing as the enemy(which seems to be the focus of a lot of config syntaxes) but rather the combination of multiple types of bracketing, plus start-and-stop usage of shift keying. I only have two types of brackets, the sequence [ type and the long string {" type, and you can "feel" when you're writing a long string because of that sudden need to use the shift.
It isn't a markup language. I'd like to correct this mistake that was started by YAML. :/ http://en.wikipedia.org/wiki/Markup_language (Using the bacronym "YAML Ain't Markup Language" only helped it grow, making more people confused as to what a Markup Language is.)
I like it, though. More grepable than JSON or YAML, with the way it handles nested keys using dot notation.
I've always been a fan of the .INI syntax but the lack of a standard (which I think Microsoft should have championed) made the format hard to use consistently. There have been attempts at standardization [1] but, alas, they never spread widely enough. In light of the above, I'm glad to see an INI-derived format with a real spec -- not necessarily because it might replace JSON but because it might replace INI.
Speaking of INI, for the longest time the killer app for INI files for me was persistent data storage in batch scripts (.bat/.cmd files in Windows 9x/NT). Using a command line utility like [2] or a similar program from IBM that sadly wasn't legally redistributable you were able to achieve persistence with minimum effort, which would otherwise be difficult to program in batch. I even wrote a portable clone of inifile.exe for MS-DOS and Linux to be able reuse my scripts more easily. TOML would sure benefit from the same.
The biggest concern with JSON seems to be the lack of comments. So what voodoo is Sublime Text 2 performing? Why can't we just use that?
{
// Sets the colors used within the text area
"color_scheme": "Packages/Color Scheme - Default/Monokai.tmTheme",
// Note that the font_face and font_size are overriden in the platform
// specific settings file, for example, "Preferences (Linux).sublime-settings".
// Because of this, setting them here will have no effect: you must set them
// in your User File Preferences.
"font_face": "",
"font_size": 12,
// Valid options are "no_bold", "no_italic", "no_antialias", "gray_antialias",
// "subpixel_antialias", "no_round" (OS X only) and "directwrite" (Windows only)
"font_options": [],
// Characters that are considered to separate words
"word_separators": "./\\()\"'-:,.;<>~!@#$%^&*|+=[]{}`~?",
// Set to false to prevent line numbers being drawn in the gutter
"line_numbers": true
}
Some JSON implementations supports comments, others don't. If you know the one you use supports (and will continue to support) comments, go ahead and use it. It just won't be portable.
Douglas Crockford himself suggests you "Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser." That sounds like a reasonable workaround.
I don't know specifically about Sublime Text 2, but from my own experience writing a configuration library which accepts JSON as an input, you usually strip out those comments before feeding the resulting content to your appropriate json_decode function.
1- Read the contents of your JSON file.
2- Strip out the comments with some regex foo or such.
3 - Feed the remaining contents to your JSON parser.
Given Jekyll's enormous backlog of issues and pull requests[0], can we expect this to be maintained or supported any bit beyond the late night drunken brain fart that this is?
Parker Moore and I (along with many contributors) have been spending quite a bit of time on Jekyll recently. Over the last 30 days we've merged 17 pull requests and closed 62 issues. We're ramping up for a 1.0 release and there's a brand new website in the works. You can check it all out on the master branch.
Tens (or possibly hundreds) of thousands of people use Jekyll now. It's interesting to note that Jekyll started out as a "brain fart" as well. Just one amongst hundreds of blog engines. I wrote it because I was dissatisfied with everything on the market, and I thought I could do something different and better, to serve my own needs. I open sourced it, because I thought others might get a kick out of it.
I'd wager that most of the great things we use today started as nearly ephemeral emanations from someone's mind, often late at night, or helped along by a snifter of brandy. The funny thing is, if you never try out your crazy ideas, you'll never know which ones might have changed the world.
The pull has 19 people asking for integration and has some stellar comments:
"seriously? year long pull request with two lines of changes?"
"I normally would think that the github gem features for paying users would get a lot of attention from the folks at github..."
I even tried getting it pulled via pre/postsales emails to enterprise@github.com (I'm a enterprise customer) which was met with a "yeah, i'll tap him on the shoulder to integrate - year later nothing.
That project isn't actively maintained at the moment (nor is it an official GitHub project), but I'll see what I can do tomorrow to get it merged in and released. Sorry for the frustration!
Cool. I'm very happy that you're back actively working on Jekyll! Guess my statement was a bit outdated then. Take it with the appropriately sized grain of salt.
Note: there's nothing wrong with releasing brain farts; quite the contrary. I didn't at all mean to imply that you shouldn't do that.
> I'd wager that most of the great things we use today started as nearly ephemeral emanations from someone's mind, often late at night, or helped along by a snifter of brandy.
. No native support for numbers, dates, booleans or lists. The latter can be implemented using subelements, but it's so cumbersome that you skimped on that and used a non-typed string instead (the database ports).
. Redundant verbosity. Root elements, closing tags, way too much crap to be manually inserted.
. XML parsers are huge, complex beasts which have no place in many smaller applications.
. Being XML, it leaves way too many possibilities for crappy developers. Namespaces in config files, oh joy!
Most of your points are environment specific and I think that you forgot the strongest of them - "xml APIs usually suck". In .net they are non-issues.
And about being cumbersome and verbose, the point I tried to make is that you don't have to be zealous and put every small piece of data in a separate element. No reason not to put data in attributes or even in comma/whitespace separated strings, if that piece of data can be extracted in one short line of code.
How so? .Net can't magically discover the types of values or prevent developers from abusing the format.
you don't have to be
zealous and put every small piece
of data in a separate element.
But then you're layering a complex format with a custom application-specific parser, with an unknown syntax (e.g. spaces vs commas, are ranges supported, etc). It obviously can be done, but it's a mess.
Agree. And if XML supported unnamed closing tags, it'd lose a lot of it's rep for verbosity. Although in this case you'd just be replacing </servers> with </> in other documents it is a lot more noticeable.
I will note this isn't a valid XML document: you have no root node.
Go "back"?! There are lots of places where xml is alive and well and config files is one of them. And you can see why - empty elements with attributes look rather concise, and without all that punctuation noise JSON has.
# line 43
array = $1.split(",").map {|s| s.strip.gsub(/\"(.*)\"/, '\1')}
You should recurse into coerce here, or you'll just lose types. (Also you're assuming arrays of strings.)
array = $1.split(",").map {|s| coerce(s) }
--
You're also not dealing with nested key groups. (eg. [servers.alpha]).
--
That being said, naïve string parsing is a terrible way to build a new markup language implementation. It's the reason the Markdown landscape is such a mess[1]. What this really needed is a formal grammar.
[1]: I actually tried to fixed that by writing a formal lexer & informal parser for Markdown in a side-project of mine[2]. It's not quite there yet, because for practicality reasons I wrote my own parser instead of a formal AST-generating parser.
Yup array handling is weak. I was going to recurse into coerce, but then the examples made it seem like only strings will be accepted in arrays (he put "8000" in there rather than just 8000). I'll get clarification.
Don't have time to work on it now, but it looks like you'll need to recurse while parsing arrays. Right now, only arrays of strings that don't contain commas are handled correctly.
What's this fragmentation you speak of? Surely you won't be forced to use one particular format you don't like. An API usually supports multiple formats.
Good point, I didn't notice that one. Certainly an interesting case. The ones that exist that shouldn't I don't think are as big of issues but it is certainly not an easy problem to solve here.
What is wrong with JSON? Everything already supports it.
JSON has two drawbacks: a lack of comments (although you could add "#" keys in relevant places) and no binary support (arbitrary conventions include base64) but this doesn't support binary anyway.
It isn't a friendly form of human input. My error rate is 50%+ , you have to lint on save to catch things that are invisible to the naked eye
No ability to override, extend or reference keys. This is most useful in config objects where for eg. in a dev object you want to override the username and password for a database connection but not repeat all the other parameters
Lack of comments is pretty much a deal breaker for configuration. I see a lot of undocumented JSON used for configuration and I find it difficult to believe that is something we want for the future.
Lack of comments makes JSON much better for data exchange than formats with comments.
What is the difference between "JS Object notation" and JSON? Google searches show they are the same thing. JSON definitely does not have comments http://www.json.org/
About the use of mark up language as config file. I see that in most Python apps, the config file is just another Python script and not using another markup language. This way makes sence in a dynamic language and it feels natural. I understand it is a habit to use yaml in Ruby apps for config. Is it not possible to just use Ruby script as config file since the script can be loaded dynamically? What are the pros and cons of using another markup language as config file vs using just the app language(Python/Ruby)?
Using script files to store config is convenient, but is it true that in some circumstances it could give malicious parties a chance to inject arbitrary executed code into your environment, in ways that parsing a pure data file could not.
It is also common for ruby configs to be script files. Rails, for instance, has the config/initializers folder which is a set of ruby scripts that will be run at startup. It comes down mostly to preference.
This is quite nice but there are a few of things that I miss:
1. A way to have multi-line values for non array types
2. A more flexible number syntax (e.g. allow hex and binary integers, allow exponents on floats, allow NaN and +/-Inf)
3. Make it possible to have an extra comma after the last element on an array (as in Python)
4. Add a way to "include" another config file
#1 is important because some projects require all lines to have a max width of 80 lines, including on config files.
#2 is important for scientific/engineering projects. I think the current simple format shows that this format is a little too web centric. If this is going to be used for non-web stuff this is a must.
#3 is something that helps when putting this sort of configuration file in version control. Without this, adding an extra entry to a multi-line array creates a diff in two lines rather than 2 (since you must add a comma to the line above the one that you inserted). This is something I miss in JSON and which Python did just right (IMHO).
#4 would be useful in cases in which you want to provide a base configuration file for example.
Also, maybe I missed it but it is not super clear what would happen if you redefine an existing entry (I hope it is possible). Finally, is order important?
Agreed, I'd much rather have a normalized subset of YAML without the object serialization stuff (I don't even understand why it's there: why take a format intended to be read by humans and then muck it up with complex and dangerous object serialization notation).
I agree, and high quality YAML parsers are generally available in every language one might want to use. I don't believe I've ever encountered a situation where I was unable to obtain one. Well rust comes to mind, but then rust is really young and you could probably make one easily by just wrapping libyaml. That said, I might just write a TOML parser in python just for kicks.
I wondered the same thing. "No, you can't mix data types, that's stupid" leaves it ambiguous.
If you parse the outer array as just "array of arrays" (as each element is an array), you're not "mixing". But if we're supposed to be parsing it as "arrays of arrays of _type_", then we are mixing.
It's unspecified, I guess, but if you want to read into the spirit of it, which is to make it trivially-supportable by type-nazi languages such as haskell, you either get a [[Int]] or a [[String]].
Like JS from which it sprang, it lacks an integer type. Fortunately, parsers written for languages that do have integers can usually parse them correctly.
(If you don't know why this might matter, try opening your browser's Javascript console and evaluating 10000000000000001)
That's my peeve, though. I suspect that Tom is probably more concerned with readability. TOML also looks like it can be parsed a line at a time and doesn't really need to do any recursive parsing, so you could probably parse a stream of it as it arrives, which I imagine is trickier with JSON.
I'd agree if JSON had a different name, but given that it is called "JavaScript Object Notation" on the main page (http://json.org/) there's an implicit expectation that it's somehow related to javascript.
No comments. Lack of essential data types, forcing you to make the contents of strings part of a hidden unspecified semantic (this parses as date, that parses as time, etc). Constrained by the limitations of JS floats (they aren't even bigdecimal). Excessive significant punctuation. Insignificant white space (permitting a difference between valid, and pretty-printed form). Looks like executable code and tempts you to parse it with eval.
Text editors will sometimes insert end-of-line characters in the name of word-wrap.
Using the end-of-line as a comment terminator would require significant refactoring of JSON parsers, which were previously at liberty to lump CR and LF together with SP and TAB. A starting and ending token, on the other hand, fits the pattern already required of a JSON parser.
This reminds me of a new project I'm working on called Leewh. It's based on Wheel and kinda has the same overall function, but I needed something to get my project rolling quickly and using .ini and JSON syntax separately felt... well... too square, I guess.
I figured I'll come up with something more well rounded.
The limitation on array types seemed fairly arbitrary at first glance, but after thinking it over I realized it aided compatibility with languages that do not support homogeneous arrays. Though as far as the types go, I would add boolean and perhaps non-quoted strings for single-word values.
Now that the technical criticism is out of the way, holy crap this guy is arrogant.