Programming language

From Citizendium
Revision as of 14:54, 10 May 2010 by imported>Johan Förberg (→‎Object-oriented vs. procedural: Added paragraph on functional programming)
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Catalogs [?]
 
This editable Main Article is under development and subject to a disclaimer.

A programming language is a human-readable lexicon and grammar that a programmer uses to instruct a computer how to operate. Programs written in a programming language have to be translated into machine code, usually by a compiler program. Machine code consists of multiple lower-level instructions which the computer can actually understand. Use of a programming language allows programmers to work at a higher level than machine code (which is not human-readable).

Language categories

The following are some of the ways that people have categorized different computer programming languages, although there is not always agreement on the precise meaning of the categories, or which languages belong in them. This article will attempt to describe the more common contradictory uses of the following terms.

Compiled vs. interpreted

One way in which various programming languages have traditionally been categorized is as compiled vs. interpreted languages. The traditional view was that compiled languages were first translated, by a compiler program, from human-readable source code into binary machine code. Some widely used early languages such as Fortran and C use pure compilation.

Conversely, interpreted languages rely, at run time, on a special runtime application, called the interpreter, to translate source code into machine code during program execution. An example of an early purely interpreted language is Snobol. Purely interpreted programs tend to execute more slowly due to the necessary intervention of the interpreter while the program is "executing". HTML is a special-purpose language that is interpreted; the interpreter for HTML is called a web browser, and it reads the HTML line-by-line and renders a web page for display to a user based on the HTML code.

The division between compiled languages and interpreted languages has blurred since the 1990s with the advent of hybrid platforms such as Java and the .NET Framework (C# and VB.NET). These hybrid languages are compiled down to an intermediate language at the time a program is written (Java bytecode and Microsoft Intermediary Language respectively). When the program is later run, the intermediate code is loaded into a sophisticated, optimized runtime engine for execution. Such runtime engines could be implemented as interpreters (early ones were), but nowadays they typically use Just-In-Time compilers to generate native machine code from the intermediate language on an as-needed basis. So multiple compilers are involved, one used by programmers to create programs that contain intermediate code, and another used at runtime to "interpret" the intermediate language (or in actuality, to just-in-time compile portions of intermediate code to native code on the fly as needed).

High-level vs. low-level

Another way in which programming languages are sometimes categorized is into "high-level" versus "low-level" languages. "High-level" programming languages have one high-level command or statement corresponding to many machine code instructions. "Low-level" programming languages, including especially assemblers, may have approximately one human-readable instruction per binary machine instruction. A "high-level" language may also sometimes be called "low-level" if it permits a programmer to perform certain (possibly risky) hardware or operating system operations. C is technically "high-level" but is sometimes regarded as "low-level" as well because it imposes little, if any, restrictions on what a programmer can do in terms of accessing the computer's raw hardware capabilities.

General purpose vs. special purpose

A third categorization for programming languages is whether the language is "general purpose" or "special purpose". A language is considered general-purpose if any program at all can be coded in the language. Conversely, if the language is targeted towards making certain kinds of things possible, but does not do everything that other languages might, it is considered "special purpose". Examples of general-purposes languages are Fortran, C, Java and C#. An example of a special-purpose programming language is SQL (used to interact with database programs).

Markup languages (special purpose)

Markup languages contain a lexicon and grammar, but they are limited in purpose. Their purpose is to mark up text information into segments, and label each segment so that another program, sometime in the future, can "render" or display this information in a useful manner (instead of as one large blob of text). Examples of markup languages are HTML, LaTeX, SGML, XML and Postscript. HTML marks up information intended to be displayed later in a web browser; HTML tells the browser where paragraphs begin and end, which text to make into hyperlinks (and the target for those), what color to make the background, and things like that. Web browsers later "interpret" the markup commands within HTML pages and then format the page for display to human readers. HTML also allows for the expression of some semantic information regarding the meaning of the text on the page: this is slowly growing with the use of microformats and RDFa, and allow for parsers to do more intelligent things with content on the Web: such as extract telephone numbers or event details and load them into software specifically designed for the purpose of handling and tracking calendar events and contacts. Markup languages often express more then simply the display of documents but also their meaning or role. Postscript commands are used to tell printers how to print documents; printers act as the "interpreter" for postscript commands embedded within documents to be printed.

PDF is a derivative of Postscript and serves many of the same functions but now can be embedded with JavaScript and other features. XML takes the markup approach one step farther. Not only can it be used for human-readable presentations, but it also provides a simple, consistent format that other programs can use to store and transfer data across platforms. There are special purpose languages which are used to define the semantics of XML-based languages - namely, DTDs and XSD or RELAX NG schemas - as well as the transformation process to move one XML-based language into another (XSLT).

Object-oriented vs. procedural

Java is an example of a strict object oriented language. Every method(function) and every attribute(variable) must live within some object. Java allows no globals. By contrast, python and C++ both provide objects but do not require their use; such languages are often called: multi-paradigm. Inheritance is a useful object technique where-by a new object can be created by modifing an existing object, thus one avoids reinventing the wheel. A non-object oriented language is considered a procedural language. In modern times nearly all very large programming projects use object oriented programming methods to manage complexity and to tame side effects. Note that nearly any language can be used with an object oriented methodology. With great effort, C or even assembly language(ref: project Geos) can be used with object techniques. A modern programming language that maximizes the idea of object orientation beyond Java is Ruby.

An alternate approach to programming, that does not rule out the others, is functional programming. In functional programming, a program is regarded as a set of functions which live in their own bubbles and return a well-defined value for each set of arguments, and which try change the state of the program as little as possible. This can be compared with object-oriented programming, where all functions act on the state of objects and that state is often hidden from be programming in private variables. The idea is that a problem can be reduced into a set of functions which do simple tasks and do not interact with each other more than absolutely required, reducing the risk of errors. This shares some of the positive effects of object-oriented programming, including reusability and managing complexity. Python is an example of a language that encourages functional programming.

Strongly-typed vs. loosely-typed

A strongly-typed language requires that each variable have a well defined type at compile time. Java is strongly typed. A dynamically-typed language like Erlang or Python allows variables to be typed at runtime. A loosely typed language may not require a variable to have any particular type. C allows the (void *) type which is an invitation to cast. Casting changes the type of a variable. We can cast in C with the following example:

#include <stdio.h>
main(){
   int x = 10; int * px = &x;
   printf("cast px to int: %d, x is %d \n", (int)px, x);
}

The output is:

cast px to int: -1080346100, x is 10 

Here we cast px from type:(int *) to type:(int). Java does not allow this. Perl is loosely typed and allows a variable to change dynamically between number and a string depending on the operators involved. Strict type checking at compile-time in Java can help one avoid many errors. Having a strongly-typed language does not mean that the type must be declared explicitly. In Java, one might write:

String x = "foo";

This would explicitly set x's type to String. But other languages like Scala and C# 3.0 allow the compiler to infer the type, rather than requiring an explicit type definition from the programmer. Here is the example in Scala:

val x = "foo"

The programmer could have typed:

val x:String = "foo"

But the compiler is clever enough to know that x is a String without explicit definition.

Proper casting in C, especially casting to and from void*, can have the problem where the destination has a smaller bit-width then the source, causing loss of data.

Another name for a loosely-typed language is 'Duck Typing', where the variable type is determined based on it's usage in the code. This is in reference to "If it walks like a duck and talks like a duck, it's a duck".

Declarative vs. Imperative

Examples of declarative languages would be sql, prolog and erlang. All other languages are mostly imperative, see list of programming languages: programming languages. Declarative languages tend to be very terse and describe only what task the programming wishes but do not include the details of how to do the task. Imperative languages tell the machine both "what" and "how" to do the task. For instance in SQL:
select * from people order by last_name;
gives a sorted list of people but does not specify the type of sorting algorithm used. One could argue that libraries of functions that abstract out the details of execution are declarative. Prolog and sql code specify some details so the boundary between declarative and imperative is not strict.

Strict vs Lazy

Real-time vs non-Real-time

Serial vs Parallel

Few languages are designed to be parallel. occam and erlang are pure parallel languages. More often, serial languages are extended with libraries that give access to parallel hardware. An example of a parallel library is PVM, parallel virtual machine. Sometimes libraries provide a data coordination language such as Linda or Gamma. Often parallel programs use either shared memory or message passing. Linda and gamma are a combination of the shared memory and message passing that use a framework called tuple-space. Tuple-space is a pool of data or tasks that many processors work on at the same time. Java-spaces is a Java version of linda. Major categories of parallel programming are SIMD and MIMD, (single instruction, multiple data) and (multiple instruction, multiple data), respectively. See: Parallel computation for more details. Renderman and glslang are examples of special-purpose SIMD parallel languages designed for rendering images on GPUs or render farms.

In languages not specifically designed for concurrency, often concurrency is implemented through through a specific language construct, often tied to a design pattern. For instance, Scala implements concurrency through Actors.

Dynamic languages

Scripting languages

Scripting languages tend to be interpreted and slower than compiled languages for the sake of convenience. There is a category of shell scripting languages for command line interfaces to Linux such as csh, bsh, bash, tsh, zsh, etc. Python is considered a scripting language even though it is semi-compiled. There are scripting languages for applications such as Lua for SciTe and elisp for emacs. Scheme and other languages can be used to script The Gimp. JavaScript/ECMAscript is used as a standard language to script web browsers (although it can be used elsewhere, for instance in Rhino (interpreter). Scripting languages tend to have automatic memory management, dynamic typing, associative arrays and other rapid prototyping features.

Assemblers

In the first computers, programmers had to work with binary machine code, which was very tedious and difficult. It was a huge breakthrough when someone wrote the first "assembler", a program which translated human-readable mnemonic words (written in plain text) into binary machine code. There is usually a one-to-one correspondence between assembler source code mnemonics (commands) with machine code instructions. A different assembler had to be written for each kind of computer, because each computer has a different machine instruction set, so there are many different assembler languages in existence (they are sometimes also called assembly languages). Assemblers were pre-cursors to high-level programming languages. In fact, compilers usually translate high-level program source code in two stages, first from human-readable high-level instructions to assembler, then from barely-human-readable assembler to machine code.


Popularity of programming languages

It's very hard to know the true popularity of programming languages, because of lack of objective information. Anyway, C (with C++, its object-oriented derivative) and Java seem to be the most popular languages, before PHP and Perl that are however very active in the internet community. TIOBE Programming Community [1] calculates every month the popularity of programming languages, based on search engines criteria. ohloh.net [2] presents a graphical statistic comparison based on coding metrics (like the number of projects, of lines, etc). On October 2007, the number of projects stored in the freshmeat.net repository [3] or in Sourceforge.net repository [4] shows the same tendency.

Some people wishing to track trends in programming language use statistics from technical book publishers like O'Reilly to infer popularity about the relevant programming languages[5]. There are problems with using this as a measure: some programming languages provide more comprehensive free, online documentation and so do not require programmers to purchase books in order to learn them. Additionally, for smaller languages, where a small number of books get published, the users of that language do not necessarily purchase books from all the different publishers.

That kind of statistics inform us about the current technical tendencies, making it possible to know about the market trends that can be important to anticipate the requirements in formation, qualified employment, etc. That said, some programmers have criticised the over-reliance on statistics about programming language popularity as being driven by fashion rather than technical excellence - it's based on the view that programming languages are standards rather than languages[6].

References