Pascal Programming/Strings: Difference between revisions
imported>Kai Burghardt Rejected the last 2 text changes (by 188.252.186.179) and restored revision 4247376 by ShakespeareFan00: This is no improvement. One edit will even introduce a programming error. |
(No difference)
|
Latest revision as of 19:15, 18 June 2023
The data type string(…) is used to store a finite sequence of char values.
It is a special case of an [[../Arrays|array]], but unlike an Template:Nowrap the data type string(…) has some advantages facilitating its effective usage.
The data type string(…) as presented here is an Extended Pascal extension, as defined in the Template:Abbr standard 10206.
Due to its high relevance in practice, this topic has been put into the Standard Pascal part of this Wikibook, right after the chapter on [[../Arrays|arrays]].
Template:XWarning
Properties
Capacity
Definition
The declaration of a string data type always entails a maximum capacity:
program stringDemo(output);
type
address = string(60);
var
houseAndStreet: address;
begin
houseAndStreet := '742 Evergreen Trc.';
writeLn('Send complaints to:');
writeLn(houseAndStreet);
end.
After the word string follows a positive integer number surrounded by parenthesis.
This is not a function call.[fn 1]
Implications
Variables of the data type address as defined above will only be able to store up to 60 independent char values.
Of course it is possible to store less, or even 0, but once this limit is set it cannot be expanded.
Inquiry
String variables “know” about their own maximum capacity:
If you use writeLn(houseAndStreet.capacity), this will print 60.
Every string variable automatically has a “field” called capacity.
This field is accessed by writing the respective string variable’s name and the word capacity joined by a dot (.).
This field is read-only:
You cannot assign values to it.
It can only appear in expressions.
Length
All string variables have a current length.
This is the total number of legit char values every string variable currently contains.
To query this number, the Template:Abbr standard defines a new function called length:
program lengthDemo(output);
type
domain = string(42);
var
alphabet: domain;
begin
alphabet := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
writeLn(length(alphabet));
end.
The length function returns a non-negative integer value denoting the supplied string’s length.
It also accepts char values.[fn 2]
A char value has by definition a length of 1.
It is guaranteed that the length of a string variable will always be less than or equal to its corresponding capacity.
Compatibility
You can copy entire string values using the := operator provided the variable on the Template:Abbr has the same or a greater capacity than the Template:Abbr string expression.
This is different than a regular array’s behavior, which would require dimensions and size to match exactly.
program stringAssignmentDemo;
type
zipcode = string(5);
stateCode = string(2);
var
zip: zipcode;
state: stateCode;
begin
zip := '12345';
state := 'QQ';
zip := state; // ✔
// zip.capacity > state.capacity
// ↯ state := zip; ✘
end.
As long as no clipping occurs, i. e. the omission of values because of a too short capacity, the assignment is fine.
Index
It is worth noting that otherwise strings are internally regarded as arrays.[fn 3]
Like a [[../Arrays#Character array|character array]] you can access (and alter) every array element independently by specifying a valid index surrounded by brackets.
However, there is a big difference with respect to validity of an index.
At any time, you are only allowed to specify indices that are within the range 1..length.
This range may be empty, specifically if length is currently 0.
Template:Caution
Standard routines
In addition to the length function, Template:Abbr also defines a few other standard functions operating on strings.
Manipulation
The following functions return strings.
Substring
In order to obtain just a part of a string (or char) expression, the function Template:Nowrap returns a sub-string of stringOrCharacter having the non-negative length count, starting at the positive index firstCharacter.
It is important that Template:Nowrap is a valid character index in stringOrCharacter, otherwise the function causes an error.[fn 4]
Template:Code:Output
For string-variables, the subStr function is the same as specifying myString[firstCharacter..firstCharacter+count].[fn 5]
Evidently, if the firstCharacter value is some complicated expression, the subStr function should be preferred to prevent any programming mistakes.
Template:Code:Output
Furthermore, the third parameter to subStr can be omitted:
This will simply return the rest of the given string starting at the position indicated by the second parameter.[fn 6]
Remove trailing spaces
The trim(source) function returns a copy of source without any trailing space characters, i. e. Template:Nowrap.
In Template:Abbr scripts any blanks to the right are considered insignificant, yet in computing they take up (memory) space.
It is advisable to prune strings before writing them, for example, to a disk or other long-term storage media, or transmission via networks.
Concededly memory requirements were a more relevant issue prior to the 21st century.
First occurrence of substring
The function Template:Nowrap finds the first occurrence of pattern in source and returns the starting index.
All characters from pattern match the characters in source at the returned offset:
| 1 | 2 | 3 | ✘ | |||||
pattern
|
X
|
Y
|
X
|
|||||
|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ✘ | |||||
pattern
|
X
|
Y
|
X
|
|||||
| 1 | 2 | 3 | ✔ | |||||
pattern
|
X
|
Y
|
X
|
|||||
source
|
Z
|
Y
|
X
|
Y
|
X
|
Y
|
X
| |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
Note, to obtain the second or any subsequent occurrence, you need to use a proper substring of the source.
Because the “empty string” is, mathematically speaking, present everywhere, Template:Nowrap always returns 1.
Conversely, because any non-empty string cannot occur in an empty string, Template:Nowrap always returns 0, in the context of strings an otherwise invalid index.
The value zero is returned if pattern does not occur in source.
This will always be the case if pattern is longer than source.
Operators
The Template:Abbr standard introduced an additional operator for strings of any length, including single characters.
The + operator concatenates two strings or characters, or any combination thereof.
Unlike the arithmetic +, this operator is non-commutative, that means the order of the operands matters.
| expression | result |
|---|---|
| Template:Nowrap | 'Foobar'
|
| Template:Nowrap | ''
|
| Template:Nowrap | Template:Nowrap |
Concatenation is useful if you intend to save the data somewhere.
Supplying concatenated strings to routines such as write/writeLn, however, may possibly be disadvantageous:
The concatenation, especially of long strings, first requires to allocate enough memory to accommodate for the entire resulting string.
Then, all the operands are copied to their respective location.
This takes time.
Hence, in the case of write/writeLn it is advisable (for very long strings) to use their capability of accepting an infinite number of (comma-separated) parameters.
Template:Hidden
The Template:Abbr, the Template:Abbr and Delphi are also shipped with a function concat performing the very same task.
Read the respective compiler’s documentation before using it, because there are some differences, or just stick to the standardized + operator.
Sophisticated comparison
All functions presented in this subsection return a Boolean value.
Order
Since every character in a string has an ordinal value, we can think of a method to sort them. There are two flavors of comparing strings:
- One uses the [[../Expressions and Branches#Comparisons|relational operators]] already introduced, such as
=,>or<=. - The other one is to use dedicated functions like Template:Abbr, or Template:Abbr.
The difference lies in their treatment of strings that vary in length. While the former will bring both strings to the same length by [[../Arrays#Character array|padding]] them with space characters (Template:Nowrap), the latter simply clips them to the shortest length, but taking into account which one was longer (if necessary).
| function name | meaning | operator |
|---|---|---|
EQ |
equal | =
|
NE |
not equal | <>
|
LT |
less than | <
|
LE |
less than or equal to | <=
|
GT |
greater than | >
|
GE |
greater than or equal to | >=
|
All these functions and operators are binary, that means they expect and accept only exactly two parameters or operands respectively. They can produce different results if supplied with the same input, as you will see in the next two sub-subsections.
Equality
Let’s start with equality.
- Two strings (of any length) are considered equal by the
EQfunction if both operands are of the same length and the value, i. e. the character sequence that actually make up the strings, are the same. - An
=‑comparison, on the other hand, augments any “missing” characters in the shorter string by using the padding character space (Template:Nowrap).[fn 7]
Template:Code:Output To put this relationship in other words, Pascal terms you already know:
(foo = bar) = EQ(trim(foo), trim(bar))The actual implementation is usually different, because trim can be, especially for long strings, quite resource-consuming (time, as well as memory).
As a consequence, an =‑comparison is usually used if trailing spaces are insignificant, but are still there for technical reasons (e. g. because you are using an Template:Nowrap).
Only EQ ensures both strings are lexicographically the same.
Note that the capacity of either string is irrelevant.
The function NE, short for not equal, behaves accordingly.
Less than
A string is determined to be “less than” another one by sequentially reading both strings simultaneously from left to right and comparing corresponding characters. If all characters match, the strings are said to be equal to each other. However, if we encounter a differing character pair, processing is aborted and the relation of the current characters determines the overall string’s relation.
| first operand | 'A'
|
'B'
|
'C'
|
'D'
|
|---|---|---|---|---|
| second operand | 'A'
|
'B'
|
'E'
|
'A'
|
| determined relation | =
|
=
|
<
|
Template:Abbr |
If both strings are of equal length, the Template:Abbr function and the <‑operator behave the same.
LT actually even builds on top of <.
Things get interesting if the supplied strings differ in length.
- The
LTfunction first cuts both strings to the same (shorter) length. (substring) - Then a regular comparison is performed as demonstrated above. If the shortened versions, common length versions turn out to be equal, the (originally) longer string is said to be greater than the other one.
Template:Code:Output The situation above has been provoked artificially for demonstration purposes, but this can still become an issue if you are frequently using characters that are “smaller” than the regular space character, like for instance if you are programming on an 1980s 8‑bit Atari computer using [[w:ATASCII|Template:Abbr]]. The Template:Abbr, Template:Abbr, and Template:Abbr functions act accordingly.
Details on string literals
Inclusion of delimiter
In Pascal string literals start with and are terminated by the same character.
Usually this is a straight (typewriter’s) apostrophe (').
Troubles arise if you want to actually include that character in a string literal, because the character you want to include into your string is already understood as the terminating delimiter.
Conventionally, two straight typewriter’s apostrophes back-to-back are regarded as an apostrophe image.
In the produced computer program, they are replaced by a single apostrophe.
program apostropheDemo(output);
var
c: char;
begin
for c := '0' to '9' do
begin
writeLn('ord(''', c, ''') = ', ord(c));
end;
end.Each double-apostrophe is replaced by a single apostrophe.
The string still needs delimiting apostrophes, so you might end up with three consecutive apostrophes like in the example above, or even four consecutive apostrophes ('''') if you want a char-value consisting of a single apostrophe.
Non-permissible characters
A string is a linear sequence of characters, i. e. along a single dimension.
Template:Caution
You are nevertheless allowed to use the Template:Abbr-specific code indicating Template:Abbrs, yet the only cross-platform (i. e. guaranteed to work regardless of the used Template:Abbr) procedure is writeLn.
Although not standardized, many compilers provide a constant representing the environment’s character/string necessary to produce line breaks.
In Template:Abbr it is called lineEnding.
Delphi has sLineBreak, which is also understood by the Template:Abbr for compatibility reasons.
The Template:Abbr’s standard module GPC supplies the constant lineBreak.
You will first need to import this module before you can use that identifier.
Remainder operator
Template:Wikibook
The final Standard Pascal arithmetic operator you are introduced to, after learning to [[../Arrays#Division|divide]], is the remainder operator mod (short for modulo).
Every integer division (div) may yield a remainder.
This operator evaluates to this value.
i
|
-3
|
-2
|
-1
|
0
|
1
|
2
|
3
|
|---|---|---|---|---|---|---|---|
| Template:Nowrap | 1
|
0
|
1
|
0
|
1
|
0
|
1
|
| Template:Nowrap | 0
|
1
|
2
|
0
|
1
|
2
|
0
|
Similar to all other division operations, the mod operator does not accept a zero value as the second operand.
Moreover, the second operand to mod must be positive.
There are many definitions, among computer scientists and mathematicians, as regards to the result if the divisor was negative.
Pascal avoids any confusion by simply declaring negative divisors as illegal.
The mod operator is frequently used to ensure a certain value remains in a specific range starting at zero (0..n).
Furthermore, you will find modulo in number theory.
For example, the definition of prime numbers says “not divisible by any other number”.
This expression can be translated into Pascal like that:
| expression | is divisible by |
|---|---|
| mathematical notation | |
| Pascal expression | Template:Nowrap |
Tasks
Template:Question-answer Template:- Template:Question-answer Template:- Template:Question-answer Template:- Template:Question-answer Template:- Template:Question-answer Template:- Template:Question-answer Template:- Template:Question-answer More exercises can be found in:
Notes:
- ↑ In fact this is a discrimination of, what Template:Abbr calls “schema”. [[../Schemas|Schemata]] will be explained in detail in the Extensions Part of this Wikibook.
- ↑ This functionality is useful if you are handling constants you or someone might change at some point. Per definition the literal value Template:Nowrap is a
charvalue, whereas''(“null-string”) or'42'are string literals. In order to write generic code,lengthaccepts all kinds of values that could denote a finite sequence ofcharvalues. - ↑ In fact the definition essentially is Template:Nowrap.
- ↑ This means, in the case of empty strings, only the following function call could be legal Template:Nowrap. It goes without saying that such a function call is very useless.
- ↑ The string variable may not be
bindablewhen using this notation. - ↑ Omitting the third parameter in the case of empty strings or characters is not allowed. Template:Nowrap is illegal, because there is no “character
1” in an empty string. Also, Template:Nowrap is not allowed, because'Z'is achar-expression and as such always has a length of1, rendering any need for a “give me the rest of/subsequent characters of” function obsolete. - ↑ If you are a Template:Abbr user, you will need to ensure you are in a fully-Template:Abbr-compliant mode for example by specifying
‑‑extended‑pascalon the command line. Otherwise, no padding occurs. The Standard (unextended) Pascal, as per Template:Abbr standard 7185, does not define any padding algorithm.