Talk:Byte

From Citizendium
Revision as of 20:57, 28 April 2007 by imported>Catherine Woodgold (→‎1024 vs. 1000 again: "Standard" may not be standard; and clarifying KiB.)
Jump to navigation Jump to search


Article Checklist for "Byte"
Workgroup category or categories Computers Workgroup [Editors asked to check categories]
Article status Developed article: complete or nearly so
Underlinked article? Yes
Basic cleanup done? Yes
Checklist last edited by Joshua David Williams 21:57, 12 April 2007 (CDT); Eric M Gearhart 16:52, 6 April 2007 (CDT)

To learn how to fill out this checklist, please see CZ:The Article Checklist.





missing on purpose?

Hi, I miss info about the small and big-endian. It should IMHO be part of the byte story. Robert Tito |  Talk  20:54, 6 April 2007 (CDT)

I did not mention it because, quite honestly, that's largely outside my scope of knowledge. If you're knowledgeable in that area, we would appreciate a contribution to the article :) --Joshua David Williams 21:03, 6 April 2007 (CDT)
Edit - I did not realize you're an editor when I wrote that. If you're busy, I could find another user to help (Eric may be able to). In answer to your question, no, it was not excluded purposely - that is, to not include it at all. --Joshua David Williams 21:06, 6 April 2007 (CDT)

what is it

Big and small-endian refer to the 'sign'-bit, in big-endian it is at the end of the byte, in small at the begin (or the other way around - I still look that up). It is used in diverse protocols to discrimninate them from others. The best known IPX/PX versus TCP/IP. It took quite some problemsolving for cisco to let these two networks communicate without problem (it gave rise to their iOS version 13 and above - created when I was on the phone with them.) Signs of bytes are of importance for the variables needed to transfer specific information. Robert Tito |  Talk  21:43, 6 April 2007 (CDT)

Should this be a separate article that deserves a mention on Byte? Remember we don't want to overwhelm the average person with too much info stuffed into the Byte article --Eric M Gearhart 04:07, 7 April 2007 (CDT)

I think it is more relevant than all the prefixes as it IS info within a Byte. Robert Tito |  Talk  09:01, 7 April 2007 (CDT)

See this jargon? That's exactly what we want to avoid. I can't make heads or tails of it. Could someone please explain this in layman's terms? --Joshua David Williams 09:47, 7 April 2007 (CDT)

big and small endians are only the way to tell the machine what the sign of the byte is: signed (+) or unsigned. Signed means only positive values are allowed. unsigned means the whole range of number space can be used. If your address space allows for file sizes up to 4 GB and you use an unsigned int to address it you CAN access that space. Using a signed variable allows you to address only 2 GB. Endian types only state where that sign is stored in the byte: the low bit or the high bit. nothing more nothing less. Some compilers use predominantly big others small endian variable. windows and unix in general use the two different styles. Robert Tito |  Talk 

I don't think I'd put it that way. First of all, the sign bit is a function of how integer values are encoded, not of bytes themselves. Big endian means most significant bit first, and little endian means least significant bit first. We write numbers in big endian form because 24 is 20 + 4, and the most significant digit comes first. Of common architectures, the i386 (including the Pentium etc.) is little endian, and virtually everything else is big endian. Oh, and you might want mention the connection to "Gulliver's Travels".Greg Woodhouse 22:40, 12 April 2007 (CDT)
I think I'm finally starting to understand this concept clearly. I'm going to re-write the endianness section of the article to make it a bit clearer, especially of what the "most significant byte" is. --Joshua David Williams 22:49, 12 April 2007 (CDT)

OK I will try and work in a one-liner on Byte, something like an "also worth mentioning is whether a Byte is big-endian or little-endian" and a link to an Endianness article.. maybe with Big endian and Little endian redirecting to it.

And yea holy crap the Wikipedia article looks more like "Look at me I can write terse technical articles" rather than striving to be reachable to the masses.

To clarify on Rovert's signed versus unsigned example: You would use an "unsigned" variable for a file system, because you're only going to deal with positive numbers. You would use a "signed" (meaning has positive and negative) address space when talking about a number that can be from -2 to positive 2 (for example).

In very very simple terms, "big endian" means you're placing importance on the leftmost numbers first. "Little endian" means you're placing importance on the rightmost numbers first.

For example: Networks generally use big-endian order; the historical reason is that this allowed routing while a telephone number was being composed.

757-421-2233 is big endian, because first comes the area code (Virginia), then 421 is the prefix (Norfolk), and then the last four numbers actually get you to the specific house.

That's the type of explanation we need in the Endianness article in my opinion --Eric M Gearhart 10:43, 7 April 2007 (CDT)

Not totally true but nice as metaphor. Robert Tito |  Talk  11:29, 7 April 2007 (CDT)

bigger better?

Both LaCie and Iomega have single disk-enclosures out with disks of 1 TB below US$500. The density of the data however is that high these disks cannot be used without solid error-correction. Bigger is not always better, at most easier. Robert Tito |  Talk  09:36, 7 April 2007 (CDT)

What else should be added?

I did a bit of research on the topic of endianness and added a section for it. If anything I said is inaccurate, please correct it. Also, what else should we add? --Joshua David Williams 19:12, 12 April 2007 (CDT)

Hexer image

Should Image:Hexer.png be in this article? I'm not sure since it shows the data in hexadecimal format instead of binary. --Joshua David Williams 19:19, 12 April 2007 (CDT)

Well bytes can be represented in Hex or binary (or octal or decimal or...). I'd say that the caption of the image should reflect that "these values represent bytes in Hexadecimal." --Eric M Gearhart 20:02, 12 April 2007 (CDT)

Integers

Is this the place to discuss how signed values are encoded (i.e., one's complement vs. two's complement)? Greg Woodhouse 22:42, 12 April 2007 (CDT)

In my opinion, this whole topic is really a subtopic of computer architecture. And how integers are represented should get its very own article. Consult a good book on computer architecture and you'll likely find at least an entire chapter about integer representation. I don't think we should try to be a text book here. I think, instead, we should help get readers oriented about a topic--really understand the issues and history--but students seeking to learn the ins and outs of integer representations already have many references to choose from.Pat Palmer 20:27, 23 April 2007 (CDT)

kibibyte?

I'd like to hear what other editors have to say, but kibibyte sounds like a neologism that never really gained acceptance. Certainly, I've never heard it used. A Google search did turn up an interesting page though, [1]. Apparently, there actually was a proposal circulated some years ago, but I don't know how far it went. As a general rule, powers of 2 are used for disk storage. For example a typical block size on modern filesystems is 4K, mean 4096 bytes, not 4000. On the other hand, data rates are always expressed in powers of 10. The 10 in 10base-T means 10 megabits per second, and the nominal data rate ofr 100base-T is 100 megabits per second. Greg Woodhouse 23:18, 12 April 2007 (CDT)

Okay, here you go

1541-2002

IEEE Trial-Use Standard for Prefixes for Binary Multiples

Status: Active
Publication Date: 2003
Page(s): 0_1- 4
E-ISBN: 0-7381-3386-8
ISSN: 
ISBN: 0-7381-3385-X
Year: 2003 
Sponsored by: 
   SCC14

OPAC Link: http://ieeexplore.ieee.org/servlet/opac?punumber=8450

Calling this terminology "standard" overstates things, IMO. Greg Woodhouse 23:32, 12 April 2007 (CDT)

See this page as well. --Joshua David Williams 23:34, 12 April 2007 (CDT)

Yes, I saw that, too. IEC might publish a standard, but the IEEE approach is much more, well, realistic. Truth be told, I can't even find the IEC document, so I'm not sure of its status, but I think IEC is just spelling out the meaning of some new words, should you choose to use them. At best, I think this terminology can be called experimental. Greg Woodhouse 23:49, 12 April 2007 (CDT)

So how should we deal with it in this article then? --Joshua David Williams 10:09, 13 April 2007 (CDT)
I've heard it quite a bit (hehe) in the last few months, but never before that. I wouldn't call it a standard now, but it's definitely worth mentioning because the differences are going to be very large. From what i've seen, only the "1337" are using "KiB". Andrew Swinehart 10:22, 13 April 2007 (CDT)

Differences

In a recent edit, Phillip Stewart changed the percentage of difference between a yottabyte and a yobibyte from 1.209% to 17.281%. Is this correct, a mistake, or vandalism? I used the formula (2^80)/(10^24) to calculate my number. --Joshua David Williams 00:00, 13 April 2007 (CDT)

I've reverted Phillip's version for the sake of consistency. I believe that he was incorrect. If not, please post a message regarding this. This is important information that we must know - and agree upon - when writing an article. --Joshua David Williams 00:25, 13 April 2007 (CDT)

I don't think so. See for yourself

1 - (pow(10, 24)/ (pow(2, 80)
= 0.17281938745
 
0.17281938745 * 100
= 17.281938745

so 17.2819% is right. Greg Woodhouse 00:29, 13 April 2007 (CDT)

Ah, I see my mistake now. I'll fix the table, but we'll need to check these things very carefully afterwards. --Joshua David Williams 00:32, 13 April 2007 (CDT)
I think we should use the raw numbers and not percents. IMO, they're much easier to understand and calculate. (2^10)-(10^3), (2^20)-(10^6), etc. Thoughts? Andrew Swinehart 10:38, 13 April 2007 (CDT)

I'd stick with percentages, as the point is that the differences can be substantial. (Did you see the footnote about the disk manufacturer that tried to use powers of 10 and the subsequent law suit?) Of course, if there's room, raw numbers might be a good thing to include, too. Greg Woodhouse

But the raw numbers actually show the difference more. KB vs. KiB is 24, MB vs. MiB is 48576, GB is 73741824, and they just keep getting bigger. I say the raw numbers show the difference much better than percents. Or, we could just add another column. Andrew Swinehart 11:06, 13 April 2007 (CDT)

Another column would be great if it fits alright. --Joshua David Williams 11:08, 13 April 2007 (CDT)

I went ahead and added a column, but I couldn't find a calculator that could tell me the exact answer for the last row, so I had to use scientific notation. If any of you can get the exact answer, that'd be great. --Joshua David Williams 11:18, 13 April 2007 (CDT)
I found the exact number here. I checked it, and it's correct. --Joshua David Williams 16:36, 13 April 2007 (CDT)

Nibble and word

Should nibble and word be combined into this article, just as megabyte is? It seems to me that there really isn't much to say about these topics that couldn't be said here briefly. --Joshua David Williams 12:47, 13 April 2007 (CDT)

Finished?

I believe this article is finished. Could an editor please take a final look at it? --Joshua David Williams 17:00, 13 April 2007 (CDT)

1024 vs. 1000 again

I really wish you would revise the section where you discuss units of storage, because the statment you make, that the use kilobytes as a unit of measurement is "non-standard" is factually incorrect. What is a correct statement is that, due to this potentially confusing terminology, IEC has standardized the terms kibibyte, etc. Standardizing the meaning of word B does not mean that use of word A is no longer standard, though using word B to mean something other than what IEC has defined it to mean would be. The obvious caveat here is that if the meaning of word A is also redefined, then the old use can be considered non-standard. I'm sorry to be a stickler here, but it it's important to be precise. By the way, I think your use of the lawsuit to show why it is important to have standard terminology is important. You also might consider citing the IEEE document I mentioned above.

If I were you, I'd say something roughly like this (by all means rephtase and flesh it out as you see fit): Storage is measured in units that are powers of 2, but data rates are measured in units that are powers of 10. This means that in some contexts 1 kB = 1024 kB, but in other contexts, 1 kB = 1000 B. This is potentially confusing (mention the law suit), so IEC has standardized a set of binary prefixes, and IEEE approved them as a trial use standard.

It may be that we're on the cusp of a significant shift in terminology (rather like the new consensus that Pluto isn't a planet), but you're still going against the grain of industry practice here, and you do need to be careful not to use CZ articles for advocacy. Greg Woodhouse 22:41, 13 April 2007 (CDT)

What do you think of this?

===Conflicting definitions===

For more information, see: Binary prefix.

Traditionally, the computer world has used a value of 1024 instead of 1000 when referring to a kilobyte. This was done because programmers needed a number compatible with the base of 2, and 1024 is equal to 2 to the 10th power. Due to the large confusion between these two meanings, an effort has been made by the International Electrotechnical Commission (EIC) to remedy this problem. They have standardized a new system called the 'binary prefix', which replaces the word 'kilobyte' with 'kibibyte', abbreviated as KiB. This solution has since been approved by the IEEE on a trial-use basis, and may prove to one day become a true standard.[1]

While the difference between 1000 and 1024 may seem trivial, one must note that as the size of a disk increases, so does the margin of error. The difference between 1TB and 1TiB, for instance, is approximately 10%. As hard drives become larger, the need for a distinction between these two prefixes will grow. This has been a problem for hard disk drive manufacturers in particular. For example, one well known disk manufacturer, Western Digital, has recently been taken to court for their use of the base of 10 when labeling the capacity of their drives. This is a problem because labeling a hard drive's capacity with the base of 10 implies a greater storage capacity when the consumer may assume it refers to the base of 2. [2]

--Joshua David Williams 21:46, 14 April 2007 (CDT)

I think that's okay. Of course. IEC did standardize the meaning of the two terms, so it's not so much a matter of not being a true standard, as it is saying the terminology now being used is non-standard is, IMO, too strong. It might be worth noting that a big reason that this terminology is confusing iss that storage (e.g., drive capacities) is measured in powers of 2, but in most cases, (e.g., data rates) the convention is to use powers of 10. This means that engineers working in one field use the same word to mean 1024 bytes that engineers in another field would use to mean 1000 bytes. Now that's confusing! Greg Woodhouse 21:58, 14 April 2007 (CDT)

Okay, I added that. I'm not sure whether everything is stated in the correct order or not. I may have my dad look at it, since he's very good at that sort of thing. What's your opinion on this? --Joshua David Williams 22:04, 14 April 2007 (CDT)
I reworded it a bit. See what you think. Greg Woodhouse 22:18, 14 April 2007 (CDT)

I think that's good. Let me know what you think of the changes I made. I wasn't exactly sure what all I should explain in further detail. I remember I defined "bit" briefly in the opening sentence. --Joshua David Williams 23:03, 14 April 2007 (CDT)

No, I think what you have looks pretty good. You might want to note that Hindu-Arabic numerals use big-endian order (as in your example of 1024). Most people don't think of this at first. But right now, I think the best thing to do is get someone else's impressions of the article, maybe someone with a less technical background. Greg Woodhouse 23:19, 14 April 2007 (CDT)

I'm not sure what you mean. Are you talking about the fact that 1024 would be stored as 0142 and not 4201? --Joshua David Williams 08:03, 15 April 2007 (CDT)

I'm not sure whether there's a standard meaning for the word "standard". I think it could mean either what an organization such as IEEE declares to be the standard, or it could mean what is very commonly used.
I'm going to insert "to mean 1024" after the mention of KiB because I don't find that completely clear otherwise (i.e. whether it means 1024 or 1000) until you get to the table. --Catherine Woodgold 21:57, 28 April 2007 (CDT)

ASCII

OK so ASCII is added, where is EBCDIC? It would be a nice addition cince ASCII is in effect 7 bit and EBCDIC 8. Also the need for Unicode seems eminent. Robert Tito |  Talk  19:56, 23 April 2007 (CDT)

character encoding, charsets, and "plain text"

The notion of plain text no longer holds these days, as it is muddied by the myriad options for character encoding (including various flavors of Unicode), as well as many different character sets. I'm not sure how we should handle it here, but it's much more complicated that this article currently portrays it. I'm not sure we need to write about it in Citizendium, as there are multiple references available on the internet, but it's something to be aware of.Pat Palmer 20:23, 23 April 2007 (CDT)

It may not be something we want to dive into deeply, I think that's definitely something we need to mention. It sounds like you know more about this topic than I do. Perhaps you could give a go at it. --Joshua David Williams 20:41, 23 April 2007 (CDT)

references

We can at least give references to these article and shortly indicate them and their topics Robert Tito |  Talk  20:46, 23 April 2007 (CDT)

endianness

works on byte level, by storing the least significant bit at the lowest or highest end of the byte, so my remark of the BIUT in stead of BYTE was correct. Robert Tito |  Talk 

oh oh, I think you're right. do you have the energy to fix it? I'm done for the moment. Thanks for pointing out!Pat Palmer 21:15, 23 April 2007 (CDT)
no big deal, yeah I will fix it (again) hehehehe rub it in rub it in :) Robert Tito |  Talk  By the way why did you delete +1000 and 1000+? as that is exactly the difference between the two systems.
I think I did too much at one time and became absolutely brain-dead. Better quit tonight while the quitin's good. Till next wild edit. Pat Palmer 21:27, 23 April 2007 (CDT)

lol OK, will you add it tomorrow again? you take WILD WIKI editing to the next level :) Robert Tito |  Talk