|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionThis document is intended to be read as a general overview of Internet/enterprise technology to get newbies to this domain up to speed. The contents of this document is arranged to provide:
The document was originally researched and written by myself in 1997. However, it is still very relevant today (c. 2005). I am publishing it here freely for you so that beginner ASP.NET developers today can appreciate and understand how ASP.NET and database connection technology came about and how it has evolved over the years. I hope you find this enough information for you to get you started developing enterprise applications for the Internet... Comments, suggestions welcome to: kevleyski@hotmail.com. Kevin Staunton-Lambert BSCS Quick overview of the Internet and TCP/IPThe ‘Internet’ was the name given to the project and prototype system which was originally developed by the Advanced Research Projects Agency (ARPA) to investigate ways to solve the problem of getting incompatible computer networks to communicate with one another. Through this project, two fundamental software standards were developed:
These software standards are generally referred to as ‘TCP/IP’ (Tea Sea Pea Eye Pea). However, a more precise title for it is ‘TCP/IP protocol suite’ because this software also includes other protocols such as the User Datagram Protocol (UDP) which is used for short packets of data such as live video and audio which do not require error checking. Any machine connected to the Internet has a unique Internet (IP) address. The IP address is a four byte code (32-bit code which has the potential to support over 4 billion machines) which is assigned so that all machines belonging to a similar network have the same prefix. (This is similar to telephone numbers being grouped into locations, however location is irrelevant when assigning IP addresses). There is currently an improved IP addressing system being developed known as IPng (IP Next Generation) which will use eight bytes (128-bit) to support 3x1038 machines, sufficient enough to potentially make it possible to control any light-switch in any house on the planet without any worries of running out of addresses. (I expect that we will see some interesting computer viruses should this ever be implemented!) Humans are generally not very good at remembering IP addresses, so they are often assigned a name commonly known as Host Name, Universal Resource Locator (URL), or Universal Resource Identifier (URI). An example of such a name is http://www.codeproject.com. URLs along with the IP address are held as an entry on a Domain Name System (DNS) which is essentially a series of computers known as 'domain servers' on the Internet supporting this ever increasing database. The World Wide Web (WWW)The World Wide Web (WWW) was originally developed by Tim Berners-Lee and other scientists at CERN laboratories (in Geneva just on the French border) to allow particle physicists to share information around the world. Today ‘The Web’ as it is generally called, is used by millions of people around the world to pass and organize hypermedia (text/graphics/sound etc.) over the Internet. Estimating the size of the web is a near impossible task. On 25th May 1996, it was estimated by 'Internet Solutions' that there were 59,628,024 people accessing 304,177 sites; today in 1998, these figures may have near doubled. The Web’s success has been achieved by creating hypermedia document standards:
Mark-up LanguagesHTML (Hyper Text Markup Language)HTML is the default scripting language supported by the HyperText Transfer Protocol (HTTP) for formatting web documents (formatting tags) and hyper-linking (anchor tags) to other web documents and scripts held on a web server. The syntax for HTML, which was adapted from SGML (Simple Generic Mark-up Language or Standard Generalized Markup Language ISO 8878:1986), is very easy to learn and is ideal for use on the web because of its low size. (Compare the size of a formatted HTML document with a similarly formatted document written using a Word processor). HTML documents are generally held in ASCII (American Standard Code for Information Interchange) format which is a standard between most software/hardware platforms. International coding characters (Unicode) can also be added to documents by using their HTML abbreviation after an ampersand (&) character, and then followed by a semi-colon (;). (For example, the symbol for the ampersand character is marked up as &.) Although there is plenty of material covering HTML, we need to look at some basic HTML structures because we will keep coming across them in the following sections. HTML document structure and TagsHTML documents always have a basic structure made up of a header (containing the document title, and other meta-data such as the author's name and date) and a body. Documents are essentially plain text with formatting tags which are very similar to the principles used by early word processors, such as Word Perfect for DOS. HTML Tags are defined between less than (<) and greater than (>) symbols and always come in pairs *, i.e., a section start tag (e.g., <large> to make text large) and a section closing tag (e.g., </large> to make it normal again). * Some browsers, such as Microsoft Internet Explorer allow lazy HTML as well as strict HTML. This allows you to miss out some tags which may seem obvious (such as the tag used to end a row in a table (</tr>)). However, it is very important to keep to the official strict syntax defined by the WWW Consortium (W3C) so that we can maintain software independence throughout the web. The following HTML script demonstrates some of the formatting tags. Notice the use of indentation to identify the affected section between a start and closing tag; this is valid HTML because tabs, and spaces larger than one character in length, are ignored. (To force double spaces, you need to use the HTML abbreviation
XML (eXtensible Mark-up Language)So far we have looked at the syntax (grammar) of HTML. However, like all languages, we should also consider the semantics (meaning) of the information we are portraying. For example, if text has been marked up in bold, it could be because it has more importance, or it might be because the author wants it to look like that. Similarly, we might use a colored typeface to denote a title, or again it might be personal preference to make the page appear more attractive. Essentially, the point of this argument is that if text is desired to be displayed as bold because it is of more importance, then it should be marked up with the logical tag As mentioned above, we can override existing, and define our own logical tags using cascading style sheets. However, it can look clumsy, e.g., having to repeat tags such as To tackle this problem, XML, unlike HTML, does not use preset tags so it is entirely up to the author how a page is formatted. The use of nested metadata tags, i.e. information about information, is also introduced. For example, say that we have a piece of information related to this paragraph. We might physically mark it up using HTML as follows: <b>Classification number</b> 1:02:01:13:00 <br>
<b>Thesaurus entry</b> XML <br>
<b>Article</b> Extensible Markup Language <br>
<b>URL</b> /dissertation/week2/HTML.html <br>
<b>Bookmark</b> XML
In XML, we would instead consider the semantics of the information, such as: <Article>
<Classification>
<Level1> 1 </Level1>
<Level2> 02 </Level2>
<Level3> 01 </Level3>
<Level4> 13 </Level4>
<Level5> 00 </Level5>
</Classification>
<Thesaurus> XML </Thesaurus>
<Article> Extensible Markup Language </Article>
<URL> /dissertation/week2/HTML.html </URL>
<Bookmark> XML </Bookmark>
</Article>
The author is then left to their own devises in writing an XSL (Extended Style Language) style sheet (see www.w3.org/Style/) which will correctly format any section marked up as 'Article' on a web page back to the original desired HTML format. Web FormsForms are used to pass user input to a web server from an HTML document. You should already be quite familiar with the behavior of the Form controls as they are used extensively in Graphical User Interfaces (GUIs) such as MS Windows, MacOS and X-Windows. (Microsoft ActiveX Controls can also be incorporated into a form to provide additional input types (e.g., Date format) and user input styles.) To better understand how forms are actually processed by the web server, set the form action to call the ISAPI example described later. (Try
CGI: The Common Gateway InterfaceCommon Gateway Interface is a specification for creating executable programs (CGI scripts) that can be run by a web server to carry out dynamic tasks such as:
CGI scripts are very easy to create using languages that can support standard output to a console, such as C/C++, PASCAL, Visual Basic and Perl (Practical Extraction and Report Language). The web server handles the programs by passing output from the programs directly to the calling web browser rather than updating the users console (screen / client window). The web server generally passes data to the CGI program (script) via environment variables. The CGI program can then look at environment variables using the same method that is used to get the value of an operating system environment variable, (e.g. MS-DOS %PATH%). When environment variables are passed to CGI scripts they are usually sent using one of two methods, GET and POST. The difference between them is essentially that data passed using the GET method, (the default method) is read by the script as command variables (e.g. read via Before we plung into CGI Scripting we need to know a little about the most useful of environment variables, the Query String and also the use of MIME (Multi-purpose Internet Main Extensions)... CGI: Query StringsThe query string is information that can be passed to a CGI script by a web browser via the URL. If you have used an Internet search engine, such as Yahoo, you may have noticed odd characters popping up (such as, ?, &, %, +) appearing in site address line of the browser, for example: If you start Yahoo (http://www.yahoo.co.uk/) and search for 'Writing CGI Scripts using C++ ', the following URL is passed by the browser: http://search.yahoo.co.uk/search/ukie?p=Writing+CGI+Scripts+using+C%2B%2B&y=y. What this URL does is action (call/execute) the program ukie located at http://search.yahoo.co.uk/search and passed the data p=Writing+CGI+Scripts+using+C%2B%2B&y=y as a query string to the program. (N.B. The first question mark (?) is not included as part of the query string.) This extra piece of data can be broken down into two parameters where they are split by the ampersand symbol '&', these are: p=Writing+CGI+Scripts+using+C%2B%2B
This first parameter 'p' is equal to the original search specification entered 'Writing CGI Scripts using C++'. However, because white space characters are not supported as part of a URL, the HTML form converts the spaces to + symbols; and because + symbols mean spaces, the two + symbols (in C++) are converted to HTML format %2B (the hexadecimal ASCII equivalent for the + symbol). (N.B. The data passed to a CGI script from a URL rather than an HTML form is passed as %20 (the ASCII space character) rather than the + symbol, why?) y=y
This second parameter 'y' is equal to the search space option flag set by the use radio button control to select 'All Sites' (y=y) or 'UK and Ireland Sites Only' (y=u) CGI: HTTP MIME Headers (Multi-purpose Internet Mail Extensions)For software, such as a web browsers and e-mail, which are capable of handling several types of information such as plain-text, HTML formatted text and graphic images, we are required to include some additional information which indicates how we desire the content to be processed. Up till this point we have not been required to include this information because it is assumed that information which is passed to a web browser with a file extension of '.HTM' or '.HTML' should be naturally be processed as an HTML document, however CGI programs are passed as raw data by the server rather than documents, so for the web browser to know how to porcess the data from the CGI script we are required to pass an additional piece of information known as a MIME header. In the our examples we will be passing HTML formatted text between our web server and browser (client). This requires the following plain-text MIME header: Content-Type: text/html <carriage return>
<carriage return>
Without this information a web browser will interpret any output from our CGI script to be plain-text and will either ignore the data or proceed to treat formatting tags as regular text. MIME headers must be the first line in any information being passed to a browser and they must be on their own line and followed by a blank line. (hence the two carriage returns). For more information regarding MIME headers you should refer to the HTTP/1.1 specification. CGI ExampleCGI 'scripts' can be written using various programming languages (see PERL example later), however for the sake of compatibility with the code in this document, we will be using C++ for our scripting. CGI C++ programs are very simple to create, technicalities such as multiple user file sharing and communications are handled entirely by the web server, so they are similar to conventional C++ console programs (i.e. simple DOS or UNIX programs). The following snippet of C++ code simply passes the environment variable QUERY_STRING passed by the web server back to the web browser (client) that called it... // Include the standard C++ classes and Input/Output Stream classes #include <stdlib.h> #include <iostream.h> void main() { char *EnvVar = getenv("QUERY_STRING"); // Standard DOS/UNIX environment variable command if (EnvVar == NULL) EnvVar = "No+Parameters+Passed"; // If there is no query string // set the parameter to No Parameter Passed cout << "Content-Type: text/html\n\n"; // Write HTTP MIME Header cout << "<html>"; // Write HTML start tag cout << "Query String: " << EnvVar; // Place the query_string parameter // to the console output stream (cout) cout << "</html>"; // Write HTML end tag } When we run this program directly from DOS (or UNIX) the program outputs the expected plain text document back to the console, however when the same program is run through a web server the plain text code is treated as HTML code and the environment string QUERY_STRING is passed back to the web browser as an HTML document. IIS/ISAPIThe Internet Server Application Programming Interface can be likened to an advanced form of CGI. CGI works on the principle of executing programs on the server when a client (web browser) requests it. There is a major flaw in this principle because each call to the CGI program requires its own individual instance of the program and thus its own memory space on the server. Therefore if 50 clients are all accessing (hitting) the server then 50 separate instances of the CGI program are required and each CGI environment variable needs to be passed to each of the memory spaces allocated. This is a heavy burden and inefficient use of server resources. ISAPI programs however work on the principle of Dynamic Link Libraries (DLL's ) which are shared between instances. The draw back to this however is that the programming becomes more complicated because we a required to implement multi-threading in out application. However for our examples we will be using the Microsoft Foundation Classes which hides the multi-threaded work in two classes; For our purposes we will be writing CGI type applications officially known as ISAPI Extensions, however there is another side of this technology known as ISAPI Filters which are are used to intercept information as it is being passed through the web server. This allows us to carry out tasks such as usage logging, user identification and security, etc. NSAPI (Netscape Server Application Programming Interface) is as the name suggests, the programming interface for Netscape web servers. However Netscape is intending to change its server architecture to use ISAPI. (Microsoft's ActiveX technology.) (ISAPI extension DLL's are executed from web pages exactly in the same way that that we execute CGI scripts, i.e. as a hyperlink or actioned from a form.) ISAPI ExtensionsAmong the seven MFC classes related to ISAPI we will be using two specific MFC (Microsoft Foundation) classes which are required to create an ISAPI extension DLL. These are:
With CGI C++ scripting we were required to manually process the QUERY_STRING environment string to establish which parameters have been passed to the script; with ISAPI C++ programming we can use the far simpler method to do this for us known as parse mapping... ISAPI Parse Mapping (with MFC)A Parse Map is an MFC (Microsoft Foundation Class) macro used to bind (map) a function to a parameter specified in the Query String passed to our DLL. The easiest way to describe parse mapping is by demonstration. A parse map is declared in the class // ISAPIExample.h // Define the class (which inherits CHttpServer), // declaring the PARSE_MAP and our functions class CISAPIExampleExtension : public CHttpServer { public: // Declare PARSE_MAP DECLARE_PARSE_MAP() // Prototypes for our functions void Example1(CHttpServerContext* pCtxt); void Example2(CHttpServerContext* pCtxt, LPCTSTR Param1, LPCTSTR Param2, LPCTSTR Param3, LPCTSTR Param4, LPCTSTR Param5, LPCTSTR Param6, LPCTSTR Param7, LPCTSTR Param8, LPCTSTR Param9, LPCTSTR Param10); }; // Parse Map Definition, Used to process // parameters specified by in the Query String BEGIN_PARSE_MAP(CISAPIExampleExtension, CHttpServer) // Handle Example1 that takes no parameters ON_PARSE_COMMAND(Example1, CISAPIExampleExtension, ITS_EMPTY) // Handle Example2 that takes upto 10 parameters // (all parameters are defaulted to have no value // if they are not submitted) ON_PARSE_COMMAND(Example2, CISAPIExampleExtension, ITS_PSTR ITS_PSTR ITS_PSTR ITS_PSTR ITS_PSTR ITS_PSTR ITS_PSTR ITS_PSTR ITS_PSTR ITS_PSTR) ON_PARSE_COMMAND_PARAMS("line1= line2= line3= line4= line5= line6= line7= line8= line9= line10=") // Set the default to Example1 (i.e. execute Example1 // when the query string is empty) DEFAULT_PARSE_COMMAND(Example1, CISAPIExampleExtension) END_PARSE_MAP(CISAPIExampleExtension) Essentially this parse map will carry out the following operations depending on the Query String passed to the DLL. ISAPIExample.dll - No query string specified, so the default function Example1 which takes no parameters is executed (defined by
Database connections (ODBC, OLE DB and ADO)Database Management Systems (DBMS), such as MS SQL Server, MySQL and Oracle, were designed to save the software developer the trouble of writing their own code to carry out tasks such as, multiple user file handling, indexing/searching (querying) and data security. ODBC is a standard, developed by Microsoft, to bridge the gap between a database such as an SQL database or indeed a simple text file (supported by the MS Jet Engine) and an application (such as Crystal Reports) that abides by the ODBC rules. For the software developer this generic standard is particularly useful when it comes to system portability. For example an application can be written to process a spreadsheet and update a database at the same time essentially using the same coding principles. Another example might be where an application was written to manipulate data in an Excel Worksheet, is upsized to an Oracle Database without the need to make any changes to the software application, likewise if the software application was changed, perhaps from an MS Windows to an Apple Macintosh environment, there would be no need of making changes to the database. The foundations of ODBC are based upon an open standard generated by the SQL Access Group (SAG) which is based on the well know relational database Structured Query Language (SQL) which you should be familiar with. Before we can get stuck in with ODBC, OLE DB and ADO, we need to know a little about the principles for establishing a connection to a database. Data Source Names (DSN) and Connection StringsTo register a database with the ODBC Manager we need to create a unique Data Source Name (DSN) entry. DSN entries vary between different ODBC drivers and details for these will be provided with your database server. After creating a DSN entry we can test the connection by using any ODBC application to connect to the database. Connections are made by passing a connection string to the ODBC manager which contain information such as the DSN and login details. A connection string to connect to my example MS SQL database is: ODBC;DSN=InternetPAL;UID=sa;PWD=;
This string is then processed by the ODBC manager and fills in the missing information, and presents the full ODBC connection string ... e.g. ODBC;DSN=InternetPAL;UID=sa;PWD=;APP=ODBC
Test Program;WSID=KEVS;LANGUAGE=us_english;DATABASE=InternetPAL
Using MFC, it is very easy to make an ODBC connection. Essentially, all we need to do is create a database object ( CDatabase m_database; m_database.Open("", FALSE, TRUE, "ODBC;", FALSE);
This is the completed example function that generates a connection string from the ODBC dialog boxes and connects to a database... // Open ODBC Connection (opens default connection string 'ODBC;') void CODBCTestDoc::OpenOdbc() { m_strConnect = "ODBC;"; // Set the default connection string, // i.e. no database name/login details BeginWaitCursor(); // Pop up hourglass mouse pointer // to show that we are busy connecting // Attempt to send connection string to the ODBC manager BOOL bRet; try { bRet = m_database.Open("", FALSE, TRUE, m_strConnect, FALSE); } catch (CDBException* pe) // Catch ODBC excpetion if there was a problem { AfxMessageBox(pe->m_strError); // Present user with ODBC Error in a message box EndWaitCursor(); // Change mouse pointer back to normal pe->Delete(); // Clear up exception pointer memory space return; // Exit function } EndWaitCursor(); // Change mouse pointer back to normal // If connection returned OK, then open a recordset if (bRet) { m_strConnect = m_database.GetConnect(); // User has selected a new connection string CDocument::SetTitle(m_strConnect); // Set current document title to this connection string OpenRecordset(); // Call function to open a recordset } } Database recordsetsOnce we have connection to a database we can create a recordset (also traditionally know in the database world as a data rowset). There are several types of which have various advantages and disadvantages in use... Dynasets - Allow bi-directional scrolling (MoveNext / MovePrevious). Data content changes can be seen by issuing Snapshots - Similar to the camera principle in that a photograph of the data is taken. Bi-directional scrolling is still permitted, however data cannot be updated until the recordset is physically closed and then re-opened. Dynamic - Similar to dynaset principle, however changes in record sort order can affect other users. (Not widely supported by DBMS's) Forward Only - Recordsets can only be scrolled from the beginning to end, and can only be read from. This has significant speed advantages however we need to close and re-open the recordset to start again. Recordsets ( CRecordset m_pRecordset; // Create recordset object m_pRecordset = new CRecordset(&m_database); // Point recordset to our database object m_pRecordset->Open(CRecordset::dynaset, "select ... from [Sites]", CRecordset::readOnly);
m_pRecordset->Open(CRecordset::dynaset, "{CALL ODBCTest}", CRecordset::readOnly);
This is the completed example function that creates a dynaset from the query string held in // Open recordset void CODBCTestDoc::OpenRecordset() { // Create a new recordset object a pass the SQL query to it CRecordset m_pRecordset; m_pRecordset = new CRecordse | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||