Oracle's iDevelop came to Calgary last Tuesday with 3 concurrent sessions in 4 time slots. The sessions ranged from development to business strategy and covered Oracle's technology aimed at deploying Web solutions.

I attended two talks on interMedia (one a general overview and the other specific to interMedia text), one on JDeveloper (Oracle's Java development environment), and one on XML and Oracle8i's XML parsing and delivery abilities.

interMedia

interMedia is an option available for the Oracle8i database. If you are familiar with Oracle products, this is Oracle's ConText option, bits of the Spatial option, and some other multimedia handling thrown in for good measure. interMedia is used to store, retrieve, index, and search data in audio, video, images, text, and locator format. The single integrated package gives you access to all of your data through simple SQL queries. The product is also extensible so if your favourite file format, compression algorithm, or whatever is not supported, you can write your own handlers and interMedia will understand it.

Three sorts of images are supported, black & white, gray scale, and colour. Most formats like GIF and JFIF and most compression algorithms like LZW and JPEG are natively supported. As well, Oracle uses a "headerless" format that allows conversion between the different formats. Also, through SQL, you will be able to perform image transformations like scale and rotate. There is also software available from third party partners that will allow you to perform searches based on features in an image.

Audio file formats such as au, wav, aiff and streaming audio are supported as well as video formats like QuickTime. (There's also a rumour that Oracle is reverse-engineering Real Audio's streaming format.) Also, for all multimedia formats, interMedia will perform automatic attribute extraction and indexing. Attributes include things like resolution, embedded comments, etc.

From Oracle's Spatial option comes the ability to store and recall locator information and the new "within" operator. The "within" operator will allow you to specify an area on a map and locate points that are within that geographical region. The geocoder system will allow you to enter a street address and have a longitude and latitude returned. Blockbuster Video uses this system in order to plan where their next stores should be built. They use demographic information from memberships to find an optimal placement of a new outlet.

Getting information into the database is available with the old standby SQL*Loader, as well as through any TWAIN-compliant scanner or Microsoft's Imaging. When the Internet File System is finally introduced into Oracle8i, you should also be able to drag-and-drop files straight into the database and have them automatically recognized and indexed appropriately. Oracle also realizes that many people have datastores for this information already and will be reluctant to store everything into the database. Through the use of the BFILE datatype you can link a file's metadata with a file location, an URL, or an FTP specifier.

To get information back out of the database, there is obvious integration with Oracle products like JDeveloper and WebDB but there is also an Oracle clipboard. Using an SQL query, you can populate the clipboard with an image that can be cut-and-pasted into other applications. As well as the image being pasted, metadata and links to further information to be served from the database is included.

interMedia text allows the storage, retrieval, indexing, searching, etc. of text documents. The documents can be stored in most any format and can be in or outside of the database. Content can be searched for using fuzzy matching (using a "like" operator), proximity searching (words near to other words), section searching, grammatical and multi-lingual stemming, stop words, same words, and relevance ranking.

Section searching is the ability to logically divide a document into sections and to perform searches using that knowledge. For example, a search for documents containing Oracle in the title and Larry Ellison in the first paragraph could be accomplished.

Stemming is searching for words that have the same root as another word. Oracle has implemented multi-lingual stemming so the system understands unique rules specific to each language. As well grammatical stemming (from Xerox) has been implemented so if you search for the word "run," Oracle will return matches for "run," "runs," "running," etc. but it will also return "ran."

Oracle also supports the ISO 2788 standard thesaurus format. The implementation contains synonym rings and hierarchical relationships. Synonym rings will let you search for documents contain similar words to your search criteria and hierarchical relationships define words into narrow or broad classes to be searched. For example, you can import a thesaurus of medical terms which would allow a person to search medical documents without having to know precise technical terms.

Finally, searches can be done be relevance. Oracle will analyze a document creating themes out of it which can be searched. Themes are things that Oracle believes that the document content is about. It does this linguistic analysis by using a large pre-loaded ontology. (The ontology is extensible so you can set it up for your environment.) The ontology is hierarchical and so themes are built up successively and ranked. For example, if Oracle is classifying an article on fruit farming in California, it will examine the word stream taking note of the word apple. Oracle creates "potential" themes of "fruit," "farming," "agriculture," etc. Assume then that the next word Oracle parses is trees. The weight for the farming and agriculture themes would be increased but not the fruit theme. So based on two words, Oracle is saying that this article is potentially about farming and agriculture with a slight emphasis on fruit. As part of theme building, Oracle can build a "gist" or a synopsis of an article using sentences taken from within the article. This of course is a canned example but you get the idea. The presenter commented that the linguistic analysis did best on general documents like newspaper articles. This was only because the ontology was more well-developed and better understood than for other domains. It also helps if the document using proper grammar, punctuation, delimiting, etc.

To take advantage of some of the above searching techniques, Oracle has introduced the "about" operator. So you can create a query like:

SELECT author, posted, content

FROM articles

WHERE posted > to_date('YYYY-MON-DD', '1999-SEP-29')

AND author LIKE 'D%'

AND CONTAINS(content, 'Oracle AND ABOUT(Internet)') > 0

which would return all the articles posted after Sep 29, 1999 (relational search) and where the author's name began with a "D" (fuzzy matching) and where the article content contains the word "Oracle" and is about the "Internet". The article does not have to contain the word "Internet" only its theme has to match.

All of the indexing is done automatically with a four step process. The text (in the database, linked by an URL, or accessible through FTP) is converted to HTML using INSO's HTML Export utility or a user-defined filter. A sectioner breaks the document into logical parts like paragraphs, titles, footers, etc. and produces one of plain-text, XML, HTML, or News/E-mail (RFC 1136). The resulting text is passed through a lexer which produces a tokenized document. This basically locates the word boundaries in a document and is more important for alphabets like Kanji and Korean (whitespace locating) and German (word decompounding). Finally, the tokens are passed to an engine that produces the final index and removes stop words.

interMedia text is the most mature part of the whole product having been developed over ten years in various different forms and under different names.

Java

Oracle's strategy for Java has been their so-called "300% Java Solution." What this means is enabling Java in the client, the middle-tier, and the backend. To this end, Oracle has embedded a Java Virtual Machine into its Application Server as well as the Oracle8i database. It is up to you, based on performance characteristics, etc., to decide where you want your Java code to run.

Accessing data from Java can be done in many ways that are all based on open standards. There are three different JDBC access methods: JDBC/OCI (tight-integration with Oracle), thin JDBC driver, and a server-side JDBC driver. Access is also provided through SQLJ developed by IBM, Oracle, and Sun which is like embedded SQL in Java. And finally, there is Infobus which are Swing components coupled with data from the database.

Since the JVM is built-in to the database and runs as part of the database kernel, no separate processes are created for the JVM. This means that each concurrent user will use up about 50KB of memory in comparison to the 4-8MB usually needed. Also, the tight SQL integration yielded near linear speedup under testing (results found at TechNet) for up to 500 concurrent Java users. To improve scalability, each Java session is assigned its own virtual JVM with its own global variables and threads along with a completely re-written garbage collector for efficiency. Java code can also be compiled using what Oracle has nicknamed "way ahead of time" compilation versus the tradition "just in time" compilation. Another nice feature of the Application Server is that it will allow profiling of Java code down to the method. This will let you find out where your Java applications need tuning.

To aid in creating Java for the database, Oracle's JDeveloper 3 product will allow you to write, debug, and publish your Java code from within its environment to any tier of your system. JDeveloper will produce Java Stored Procedures, Servlets, Enterprise Java Beans, and anything else Java. With Oracle8i you also get a built-in ORB for CORBA developers and using JDeveloper, Java code can simply be made accessible via IIOP as well as publishing to the database. The development environment will also do a lot of the setup work for you. For example, it will create a standard EJB framework for you to plug your code in to. Oracle is also releasing Business Components for Java which will allow people to graphically model business processes and data and have an EJB produced as a result. This will allow separation of business rules from actual code and will allow straight-forward reuse of components.

XML

Everybody is jumping on the XML bandwagon and Oracle is no different. The Oracle8i server comes embedded with an XML Parser, an XSLT Processor, and XML SQL utilities.

Getting XML-formatted data out of the Oracle database is as simple as using SQL. There is an XML SQL Utility that can be called from the command line, Java, or PL/SQL or you can use the XSQL Servlet which allows XML pages with embedded SQL commands to be processed. Oracle also has a Java Class Generator which will generate Java out of DTD. The Java classes can then be used to programmatically construct XML documents from the database.

Getting data into Oracle from an XML format is also fairly simple. Documents can be passed directly to the XML Parser, inserted with the XML SQL Utility, or by dragging and dropping into Oracle's Internet File System. Documents can be parsed and stored as a whole or table creation and inserts can be done. For ease-of-use, tables will automatically be created with columns based on the XML tag names but this can be overridden to provide flexibility. In both forms, the XML data can be searched either using straight SQL in the case of tables or using interMedia's text searching abilities in its unstructured form.

For further information on Oracle technologies, check out TechNet. Registration is free.