Jérôme De Cuyper

September 22, 2011

“Microsoft.ACE.OLEDB.12.0″ for linked server “(null)” returned message “Unspecified error”

Filed under: MSSQL — Tags: , — jdecuyper @ 3:22 pm

An OLE DB provider is a high level abstraction interface that allows to access data contained inside a simple text file, an excel sheet or other kind of documents that are not necessarily accessible through a language such as SQL. This technology was designed by Microsoft in order to replace the well known ODBC standard. Interestingly, they recently announced that the Redmond company now would align back with ODBC for Native Relational Data Access!

Whoever as worked with OLE DB and excel files knows that it easily becomes a big pain :) ! Most of the time, I try to avoid it but in specific scenarios it becomes unavoidable. If you want to easily and quickly dump data from an excel file into a table with a single SQL statement, it goes as follow:

INSERT INTO tblMySpecialData SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0;Database=C:\HTTP\Excel\MyDataShhet.xls;HDR=YES;IMEX=1',
'SELECT [NAME], [ADRESS] FROM [Sheet1$]')

For the record: “HDR=Yes;” indicates that the first row contains column names, not data and “IMEX=1;” tells the driver to always read “intermixed” data columns as text. You need to know the name of the sheet in order to read the data from it. I use ADO.NET to get the name of the first sheet and pass it over to the stored procedure.

Among all the errors you can receive when executing this command, one of the famous one is:

error

No error was thrown when this command was executed from within the stored procedure but it just wouldn’t work once executed from MSSQL Server Management Studio! I’m running a Windows 7 PC and the issue was solved opening the Management Studio with Administrative rights! Hope it helps someone :)

January 17, 2011

FluorineFx – Could not find a suitable method with name {0}

Filed under: ActionScript, C#, English, FluorineFx — jdecuyper @ 10:18 pm

FluorineFx is a free and open source Flash/Flex remote gateway that allows easy and efficient communication between an Adobe Rich Internet Application and a Microsoft.Net back-end.

I have been using FluorineFx with several projects since 2007 such as the following website that offers games, contests, customizable pages (myspace-like but simpler :) ) for Spanish speaking kids:

pandilla_telmex

Using the ActionScript Message Format (AMF), FluorineFx allows to match successfully a remote call from a client to a C# method. Not only method name are matched but also their respective arguments, even custom classes can be passed around.  A conversion table from AS’s types to C#’s types is available if you visit FluorineFx’s online documentation.

So, for example, on the client side you have an AS2 class with a “GetSomething()” remote method and two callback methods (”onSuccess” and “onError”) to process the results or any potential error. If results are received a dispatcher pipes a new event in order to notify all the objects listening to this particular event :

import mx.events.EventDispatcher;
import mx.remoting.*;
import mx.rpc.*;

class com.myCustomClass {

var __gatewayUrl:String
var __pendingCall:PendingCall;
var __service:Service;

function dispatchEvent() {};
function addEventListener() {};
function removeEventListener() {};

function myCustomClass(gateway:String) {
__gatewayUrl = gateway;
mx.events.EventDispatcher.initialize(this);
}

function GetSomething():Void {
__service = new Service(__gatewayUrl, null, "MyBackend.MyCustomClass");
__pendingCall = __service.GetSomething();
__pendingCall.responder = new RelayResponder(this, "onSuccess", "onError");
}
function onSuccess(re:ResultEvent):Void {
var eventObj:Object={target:this,type:"fecha"}
eventObj.rs = String(re.result);
dispatchEvent(eventObj);
}

function onError(fe:FaultEvent):Void {
var getSomethingFault:Fault = fe.fault;
trace(getSomethingFault.type);
trace(getSomethingFault.description);
trace(getSomethingFault.detail);
trace(getSomethingFault.faultstring);
trace(getSomethingFault.faultcode);
}
}

And, symmetrically, on the C# side you have a similar class marked with the “RemotingService” attribute which tells the Fluorine engine that this class should be considered when an AMF request comes in:

using System;
using FluorineFx;

[RemotingService("MyCustomClass")]
public class MyCustomClass
{
public MyCustomClass() { }

// method may or not be static
public string GetSomething()
{
// do some work
return "result";
}
}

Note that it works the same using an AS3 class with some minor changes. I forked and changed an ActionScript library called FluorineFxNetClient library at github. It easily allows to connect AS3 classes to a C# backend and avoids repetitive code like the all the dispatching process you saw in the AS2 class above.

So, what does FlurioneFx do exactly? Well quite a bit since it reads all HTTP requests with a content type of AMF and tries to find the corresponding method, execute it and return it results back to the client side (if any). By design, all AMF requests are pointing at an ASPX page called: “gateway.aspx”. This page is actually an implementation of an HttpModule which is declared in your web.config the following way:

<httpModules>
<add name="FluorineGateway" type="FluorineFx.FluorineGateway, FluorineFx" />
</httpModules>

Once the request comes in, a linked-list data structure is created containing at each note a different kind of filter. All filters will be executed one after the other on the AMF request until a match is found. Here is an image of all the filters available, note that all of them are used except for the “CacheFilter” which right now is left out in the last repository update (revision 209):

listOfFilters

In my case, the most interesting filter is contained inside the “ProcessFilter” class. Inside this class, the lookup for the actual C# method is started. If a corresponding method is found, using C#’s reflection mechanisms, an object from the particular type is created and its method gets executed with the necessary arguments. In case the method is not found, an exception is thrown that reads as follow: “Could not find a suitable method with name {0}”. In most case, the lookup is simply working perfectly. However, on a more heavy application I started to receive some errors from that particular filter. When I mean heavy, I mean something around 60k hits per month, which is really not that much but it is respecting to my small experience as web programmer :) .

I enabled full debug messages and found out that some methods are sometimes perfectly executed and then, a couple of seconds later, are said to be unfindable by the FluorineFx engine! So somehow, in some strange circumstances, existing methods are not found inside a requested type. The problem was cracked by user zieDaniel1 on the FluorineFx’s forum:

The problem is a race condition in FluorineFx/Messaging/Endpoints/Filter/ProcessFilter.cs, lines 152-156. The FactoryInstance that is returned from destination.GetFactoryInstance() on line 152 is shared between threads, and what often happens is that factoryInstance.Source is overwritten by another thread by the time factoryInstance.Lookup() is called! Then, the wrong class is used to find the remote method on, and the method cannot be found.

The “FactoryInstance” class is responsible for the lookup procedure. It is only created once and thus shared among threads. Since the logging message was only reporting the name of the unfindable method, I added to it the name of the object on which that lookup was performed. And surprise, when the error is thrown, the method is looked inside the wrong object!

Before applying any patch to the source code, I wanted to be able to reproduce the error locally. I created a small C# application that sends AMF HTTP requests to a remote FluorineFx gateway. I could have it done with a Flash application but I found it more flexible to work exclusively with C#. Since I though that the exception would be easily thrown under heavy load I started firing in a loop a bunch of AMF requests working on different threads. After a couple of tries, the error was still not showing up. I changed a bit my approach and decided to send some threads to sleep for half a second based on the name of the method they where looking for. This would give enough time for another thread to override the factoryInstance.Source’s value:
ThreadToSleep

That did the trick, the error was thrown on all executions of the C# client application! The next step was to apply the patch proposed by zieDaniel1. The main idea is to simply lock the access to the instance of the “FactoryInstance”. Since a lock can become expensive, I decided to create a small benchmark in order to approximate the performance cost. The lock around the “FactoryInstance” goes as follow:

object instance;
FactoryInstance factoryInstance = destination.GetFactoryInstance();
lock (factoryInstance)
{
factoryInstance.Source = amfBody.TypeName;
if (FluorineContext.Current.ActivationMode != null)//query string can override the activation mode
factoryInstance.Scope = FluorineContext.Current.ActivationMode;
instance = factoryInstance.Lookup();
}

I have been trying to benchmark both DLL (with/without lock) locally and on a remote server but was not able to find any relevant discrepancy between both. I used about 100 methods and stored the time elapsed between the execution and the response. Each method is called fifty times. Results vary very little, between 0 and 25 milliseconds, nothing to be afraid of. The average time with lock is 78 milliseconds and without lock is 75 milliseconds. The results are clearer when visualized as a chart:

FluorineFxWithWithouLock

The X axis represents the fifty executions of all methods. The corresponding value on the Y axis is the average execution time for the 100 methods. Since no relevant performance hit could be detected, I decided to push the new version of FluorineFx to our production server. Since then, no error of such kind has been logged! I submitted a patch to FluorineFx’s mailing list but it seems that changes are no longer made to the source code. So if you are experiencing the same issue, you can leave a message and I will be glad to help you out.

December 21, 2010

Compile and run mono 2.8.1 on Windows XP and Vista

Filed under: English, Mono — jdecuyper @ 11:06 am

Compiling mono on Windows is a bit more complicated than on linux. Fortunately, thanks to a post by Andreia Gaita, I got there and came back again “while remaining reasonably sane”. I decided to write down my own enlightened path to the compilation of mono on Windows in order to refresh a bit the details about the cygwin dependencies.

The key part of the compilation is to have a correct configuration of cygwin installed and also to grab a mono source code that is ready to compile (i.e. with no errors, sometimes the latest clone from github may contain potential problems). After a lots of compiling intents, I learned that the best method was, after each compilation crash, to remove completely cygwin from my system and make a clean install. It was a bit of  a hassle at first but hopefully you will not to go through all this  and have your mono build ready to compile in a couple of hours.

1. Get cygwin: download the latest cygwin executable. I’m running version 1.7.7-1. To find out about the version you are working with, type “uname -a” inside cygwin’s console.

2. Get mono source code: download the latest source code from Github or grab the latest source code from the stable source code page. I would recommend you starting your first compilation with the stable source code just to get it working. Later, you can start working with a fresh copy from Github.

3. Install mono: download the Windows installer and have it executed. I left the mono installation on the following path: “C:\mono” in order to avoid problems with spaces inside directories’s names.

4. Install cygwin & packages: execute the cygwin’s installer. In addition to the default packages, install cygwin with the following ones:

automake v. 1.11
bison v. 2.4.2-1
gcc-core: C compiler v. 3.4.4
gcc-g++: C++ compiler v. 3.4.4
gdb: GNU debugger v. 6.8-2
getttext (version 0.17-11 was already included)
intltool v. 0.41.1-1
libiconv v. 1.13.1-1
libiconv2 (version 1.13.1-1 was already included)
libtool v. 2.4-1
pkg-config v. 0.23b-10
libintl-2 v. 0.12.1-3 (make.exe requires it)

Do not install make.exe, check to see if it is not added in the default package, mine did not include it but you never know so don’t forget to have a look at it. Instead, download the non buggy version (3.80-1) of make.exe from the mono website. Add the make.exe file to cygwin’s bin directory:

cygwin_make

5. Configure cygwin’s path: when you launch the cygwin console for the first time, 3 files are created with your default preferences (.bashrc, .inputrc, .bash_profile). On my Windows XP box, those files are created at “C:\cygwin\home” and on my Windows Vista box at “C:\cygwin\etc\skel”. Edit the .bashrc file and add at the end of the file the following lines:

PATH=.:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/cygdrive/c/mono/bin
PKG_CONFIG_PATH=.:/lib/pkgconfig:/cygdrive/c/mono/lib/pkgconfig
LD_LIBRARY_PATH=.:/usr/local/lib:/usr/lib:/lib:/cygdrive/c/mono/lib
export PATH PKG_CONFIG_PATH LD_LIBRARY_PATH

Don’t forget to replace “/cygdrive/c/mono/bin”, “/cygdrive/c/mono/lib/pkgconfig” and “/cygdrive/c/mono/lib” with the correct location of the mono installation on your system.

6. Launch compilation: copy the mono source to “C:\cygwin\usr\src”. Add the mono’s bin (”C:\mono\bin”) directory to Windows path. This is important because the compilation process will require an existing gmcs compiler to terminate successfully. Now, launch cygwin’s console and once located inside the source directory, execute: 

./autogen.sh --prefix=/usr/local &amp;amp;amp;amp;&amp;amp;amp;amp; make &amp;amp;amp;amp;&amp;amp;amp;amp; make install

If all goes well, the mono binaries will be created and copied to “C:\cygwin\usr\local”. Now remove “C:\mono\bin” from your path, add “C:\cygwin\usr\local\bin” and “C:\cygwin\usr\local\lib”) and start coding :) .

mono_gmcs

November 9, 2010

Compile and run mono 2.8.1 on openSUSE

Filed under: English, Mono — jdecuyper @ 6:41 pm

I never mentioned this in a earlier post but since January 2009, I’m studying a master degree in Computer Science in a Mexican university. If everything goes well, I will be graduating right during Christmas of 2011!

Since a couple of weeks, I started to design and write my thesis. My initial idea was to work around a garbage collector algorithm.  More precisely, I wanted to design a GC based on a reference counting algorithm and implement it inside the mono project. After chatting a bit with Paolo Molaro (one of Mono’s original designer), I understood how ambitious my project was :) . UPDATE: my thesis’s theme had to be changed, mainly because it will allow me to save money on my last courses that will help me to finally graduate. However, I’m planning to submit the GC’s new generation project to the Google Summer of Code.

Instead of creating a new GC algorithm, Paolo presented me with a more suitable task:  ”we currently have 2 generations: introduce a third one that sits between the nursery and the old generation. It should be a bump-pointer style generation for fast gen0 collections.” Although more concise, this task contains also a lot of surprises. So, I decided to use my thesis to implement a new generation of objects between the gen-zero (”the nursery“), where most of the program’s objects are firstly allocated,  and the older generations. I will post my advances bit by bit on this blog.

As a first approach with the mono project I downloaded the complete source, compiled it and run a simple program against it. Since I have Windows installed, I’m using a virtual machine to run openSUSE.

1) Trigger the autogen.h file with the necessary parameters:. The “prefix” argument indicates the directory where the compiled project will be dumped.

./autogen.sh --with-large-heap=yes --with-profil4=yes --prefix=/opt/mono -with-sgen=yes

mono-autogen

2) Once the files are ready, you will be prompted to type “make”:

mono-make

3) And finally type  ”make install” to export all files to the output directory:

mono-make-install

To use your compiled source from MonoDevelop, add a new default .NET runtime for your project:

mono-runtime

August 17, 2010

Retrieve return value from executable inside a C# application

Filed under: C#, English — Tags: — jdecuyper @ 8:47 pm

Recently, I encountered some difficulties trying to read the return value from an executable file. I’m was trying to call SpamAssassin from within a C# console application and couldn’t retrieve it’s return value. Since the exit code indicates whether an email is spam or ham, I had to find a way to receive that value. After struggling a bit, I found out that instead of calling directly SpamAssassin, I had first to call cmd.exe and handle it, as an argument, the path to the SpamAssassin executable. The code ended as following:

Process p = new Process();
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.UseShellExecute = false;
p.StartInfo.Arguments = @" /C C:\spamassassin.exe -e -L < C:\SPAM_TEST.MAI";
p.StartInfo.FileName = @"C:\WINDOWS\System32\cmd.exe";
p.OutputDataReceived += (sender, arguments) => Console.WriteLine("Received output: {0}", arguments.Data);
p.Start();
p.BeginOutputReadLine();
p.WaitForExit();
Console.WriteLine("Exit code: " + p.ExitCode);
p.Close();

Here is a screenshot of the console after calling cmd.exe with a ping command:

launch_exe_from_console

August 15, 2010

Query CouchDB from a C# application

Filed under: C#, English — Tags: — jdecuyper @ 10:17 pm

For a couple of years now, I have been following Damien Katz’s blog entries.  I subscribed to it mostly because I got fascinated reading his story about how he got to rewrite the Lotus Notes’s Formula Engine. In a couple of months, he was able to make a complete re-write of a programming language and make the engine more than 3 times faster!

Damien Katz also happens to be the founder and CEO of CouchDB. For those how are not familiar with this technology, here is a brief description:  CouchDB is a free and open source documented-orientated database engine. It was designed mostly to serve as a database for web applications.  Traditionally, a relational database management system (RDBMS) presents the data to the user as relations i.e. as a set of tables with each table consisting of a collection of rows and columns. In change, CouchDB stores data as a collection of JSON documents. For example, here is how a document about different chocolates is stored on my local server:

JSON document

Since data is stored as a flat collection of documents without any scheme describing its internal relations, the engine is capable of storing large scale objects.  But how can semi-unstructured data (i.e. without relations) be queried? The views allows you to define functions to query  the documents of a database but without affecting the underlying document.  For example, the simplest view returns a table containing all the documents in your CouchDB server. The function goes as follow:

function(doc) {
emit(null, doc);
}

And outputs my chocolate document:

{"total_rows":1,"offset":0,"rows":[
{"id":"chocolates","key":null,"value":{"_id":"chocolates","_rev":"2-4066442969","name":null,"cacao_porcentage":null,"country":null}}
]}

Since CouchDB was build on the web, it’s no surprise it has its own HTTP API to consult it. If you are a heavy Windows user like me, you will need to switch your old query analyzer for your browser. This is what makes CouchDB a fresh and interesting abstraction.  Everything (or almost) will now be done using the HTTP API. As Jeff Atwood once pointed out, the web browser address bar is becoming the new command line. This is becoming so true with CouchDB it’s almost scary!

Since CouchDB hit recently 1.0, I was tented to give it a try using a simple C# application. First you will need to download the latest build. I got mine from a question on Stackoverflow but you can also download it from the official wiki. Although CouchDB is able to run on Windows, it is not yet officially supported. Once you have the files on your computer, you’re almost done! If you downloaded the first build mentioned you will only need to launch the couch_start.bat file inside the bin directory and the server will be loaded in seconds:

couchdb_start

The console should now appear and tell you to relax:

couchdb_console

You should now be able to navigate to your server at: http://127.0.0.1:5984/_utils. In order to query your CouchDB server, you will require a C# API. Several API already exists. I gave it a try with SharpCouch. I attached a small console C# project that uses SharpCouch to list all databases available inside your server:

couchDB_list_db

July 13, 2010

Older version of StringEnum throws error when accessed by multiple threads

Filed under: C#, English — Tags: , , — jdecuyper @ 4:30 pm

A C# enumeration refers to a set of named constants. Every value inside an enumerator list has a default integer type which starts at zero and, if not specified otherwise, gets automatically increased by one:

enum WebsiteType {
Blog = 0,
News = 1,
Youtube = 2  };

Which is completely equivalent to:

enum WebsiteType {
Blog,
News,
Youtube  };

But not to:

enum WebsiteType {
Blog = 2,
News = 4,
Youtube = 6  };

Enumeration values can only be associated with integer values, if you want to associate an enum value with a string you will need to enhance the enumeration’s behavior. To achieve it,  I have been using the “StringEnum” class from the Code associate .NET project for quite a while now.  The StringEnum class is a helper class that allows you to associate string value to enum values such as:

public enum Animal {
[StringValue("Domestic cat")]
CAT = 0,
[StringValue("Domestic dog")]
DOG = 1,
[StringValue("Domestic golden fish")]
FISH = 3
}

Once you need to retrieve the value associated with a specific value of the enumeration, you can do the following:

string enumValue = "";
enumValue = StringEnum.GetStringValue(Animal.CAT))

It offers a very practical behavior that I have been using throughout a lot of different projects. Nevertheless, I recommend you to use the up-to date version of the class since the one I have been using is not thread safe. I downloaded it a while ago and never bothered to check for a newer version.

Recently, I created a small ASP.NET image gallery for a contest around the anniversary of the Mexican revolution and independence. People were given the right to upload an image and to vote for an image. I configured log4net in such a way that every error that got caught was sent to me by email. It is that way I discovered that an error is sometimes raised when different threads are working with the StringEnum class. I found it quite interesting since I have been using log4net and StringEnum on a lot of projects and the error had never be thrown before.

For performance reason, the StringEnum class holds a static hashtable with all the string values from the enumeration. This helps reducing the look-up costs caused by the reading of the enumerator’s attributes. When you request the StringValue of a particular enumeration value, the StringEnum class queries its hashtable. If it founds the value then it returns it otherwise it adds the new requested value.

The problem is that the hashtable is a static attribute which means it is shared by all the instances of the StringEnum class. Since there is no control over the insertion into the hashtable, when several threads were accessing it, a “System.ArgumentException” could be thrown:

error_stringEnum

Although the error message is shown in Spanish, it is quite clear that a thread was trying to add a value to the hashtable that already existed! This is not a big surprise since the MSDN documentation states that a “Hashtable is thread safe for use by multiple reader threads and a single writing thread“.  When working with multiple writing threads, things starts to go wrong.

I uploaded a small 2008 Visual Studio solution with a performance test that I applied to the old, the new version of the StringEnum class, and also to a custom version I wrote. Before writing to the hashtable, every thread must request a lock on it, this prevents multiple writing from happening. We need to make sure that while a thread is writing a new value to the hashtable, no other threads can come in and do the same thing. This is where you need to use a lock.

The new version of  StringEnum from the Code Associate locks directly on the hashtable. Although this approach is working, I prefer to use an additional variable to lock on. Creating “lock variables” makes your code a bit cleaner and prevent from having two pieces of code blocking on the same object. In the words of John Skeet: “I believe this is a bad idea (locking on a instance), because it means you have less control over your locks. Other code may well end up locking on the same object as you do within your code, which makes it far harder to ensure that you only obtain locks in an appropriate order“.

If you try out the solution, you will see that the old version of StringEnum (CA_old directory) fails while the newer version (CA_new directory) and mine (CA_custom directory) pass the test without throwing exception. Finally, if you look through my code you will see I added a line of code inside every StringEnum  class that forces the thread to sleep for 12 milliseconds. I added this line in order to provoke more quickly a writing/reading error from a thread. And for those how prefer not to download the project, the method looks like the following, where _stringValue is our hastable:

GetStringValueWithSleep

Download the 2008 Visual Studio solution with the test case.

May 1, 2010

DotNetWinService versión 1.0.0.0 liberado

Filed under: DotNetWinService — Tags: , , — jdecuyper @ 10:46 am

Acabo de liberar la versión 1.0.0.0 de DotNetWinService en codeplex. Escogí codeplex porque su navegador de código fuente es bastante práctico y claro para consultar archivos del proyecto:

DotNetWinServiceCodePlexBrowsesource

Navegador de archivos fuentes

DotNetWinService es una mezcla de varias tecnologías entras las cuales se encuentran log4Net, Spring.NETQuartz.NET. Permite implementar de forma declarativa (con XML) tareas programadas adentro de un servicio Windows. Las tareas se definen adentro del archivo spring-objects.xml:

Archivo spring-context.xml

Archivo spring-objects.xml

Existen, por lo pronto, 4 tipos de tareas disponibles:

TaskURL, para ejecutar una petición HTTP: cuando se trabaja sobre un sitio web en ASP.NET, a veces conviene mejor mantener toda la lógica de negocio adentro de una página que se ejecuta en un intervalo regular. Puede servir para generar un reporte y mandarlo por correo una vez por semana por ejemplo.

TaskEXE, para ejecutar un archivo EXE o BAT: éste procesa se carga solamente si es el único cargado en memoria, no pueden existir varias instancias del mismo proceso corriendo al mismo tiempo. Puede servir para realizar tareas más pesadas como procesar imágenes o generar reportes que ocupen más tiempo de computación.

TaskMethod, para disparar un método adentro de un assembly: solamente se soportan tipos primitivos y cadenas (System.String). En lugar de tener un archivo EXE, se carga el assembly en memoria, y mediante reflexión, se ejecuta un método (estático o de instancia).

TaskMethodInterop, para intercambiar datos entre dos métodos en el mismo assembly o en assemblies diferentes: el valor de retorno del primer método se convierte en el valor de entrada del segundo método. El parámetro es de tipo System.String.

El proyecto está desarrollado con C# y Visual Studio 2008. No duden en comentar o en aportar código.

April 9, 2010

¿Por qué concatenar cadenas es muy, muy mala práctica?

Filed under: C# — Tags: , — jdecuyper @ 7:06 pm

Hace unos días Kirill Osenkov, ingeniero que trabaja en el equipo del IDE Microsoft Visual Studio C#, posteó una pregunta de entrevista que se le hizo interesante:

En un string dado de .NET (en Unicode), considera que existen saltos de línea estándares en la forma \r\n (el equivalente de Environment.NewLine).
Escribe un método que inserte espacio entre dos saltos de líneas consecutivos con el fin de separarlos.

Me llamó mucho la atención, en primer lugar, porque en agosto del 2009 había participado en el proceso de entrevistas para trabajar como intern en Redmond. No recibí ninguna oferta pero este año, lo intentaré otra vez :) . Y en segundo lugar, porque desde mis entrevistas quedé muy aficionado a este tipo de preguntas técnicas sobre estructuras de datos o recursión entre otros. Dado que había encontrado el post poco tiempo después de su publicación, tuve tiempo de escribir una de las primeras respuestas. Tiene muchos defectos pero también contiene unos elementos interesantes:

string text = "hello \r\n\r\n\r\n world!";
string textResult = "";
char[] caFromText = text.ToCharArray();
for (int i = 0; i < caFromText.Length; ++i)
{
textResult += caFromText[i];
if ((int)caFromText[i] == CARRIAGE_RETURN)
{
// check if not reaching the end of the array
if (i + 3 < caFromText.Length)
{
if ((int)caFromText[i + 1] == NEW_LINE
&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; (int)caFromText[i + 2] == CARRIAGE_RETURN
&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; (int)caFromText[i + 3] == NEW_LINE)
{
// two consecutive breaks where detected
textResult += "\n ";
// jump to the next break
++i;
}
}
}
}

Mi idea fue transformar la cadena en un arreglo de caracteres. Luego, iterar sobre cada caracter buscando una secuencia consecutiva de saltos de líneas. Para evitar salirse del arreglo, utilicé un centinela que valida si ya estamos llegando al final del arreglo. Cuando se cumple la condición, se concatena un espacio a la cadena ‘textResult’ que nos sirve para almacenar el resultado final.

Rik Hemsley recompiló todas las repuestas que se propusieron en una bonita solución de Visual Studio. Además, agregó una serie de test para comprobar la validez de todas las implementaciones:

HSBenchmakrPlayground

Me dio gusto ver que mi algoritmo sí pasó todas las pruebas mínimas requeridas para agregar un espacio entre dos saltos de líneas consecutivos. Sin embargo, la velocidad de resolución no era nada buena dado que hace uso de una cadena para concatenar el resultado. Sabía que no era la forma correcta de hacerlo, de hecho lo mencioné en mi comentario, pero no me imaginaba lo malo que podría resultar ser la concatenación. Usando el benchmark de Rik Hemsley pude detectar que mi algoritmo lograba parsear un archivo de 2,106,233 caracteres en aproximadamente 107 minutos o 1 hora y 47 minutos!

Para remediar al problema, cambié la cadena por la clase StringBuilder. Las cadenas en .NET framework son inmutables: cuando concatenamos una cadena, cada vez se crea un nuevo objecto de tipo String en memoria con el valor antiguo más el valor a concatenar. El método Append de la clase StringBuilder permite evitar la creación de una cadena a cada concatenación. A continuación, viene mi versión revisada:

using NUnit.Framework;

namespace KirilQuestion.Implementations
{
[TestFixture]
public class JdecuyperRevisited : InsertSpacesFixture
{
private const char CARRIAGE_RETURN = '\r';
private const char NEW_LINE = '\n';

public override string InsertSpaceBetweenCrLfs(string input)
{
var textResult = "";

var caFromText = input.ToCharArray();

for (var i = 0; i < caFromText.Length; ++i)
{
textResult += caFromText[i];

if (caFromText[i] == CARRIAGE_RETURN)
{
// check if not reaching the end of the array

if (i + 3 < caFromText.Length)
{
if (caFromText[i + 1] == NEW_LINE
&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; caFromText[i + 2] == CARRIAGE_RETURN
&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; caFromText[i + 3] == NEW_LINE)
{
// two consecutive breaks where detected
textResult += "\n ";

// jump to the next break
++i;
}
}
}
}
return input.Replace("\r\n\r\n", "\r\n \r\n");
}
}
}

Esta última versión de mi código procesó el mismo archivo en 58.5738 milisegundos, más de 10.000 veces más rápido que mi primera versión. Usar la concatenación genera un costo altísimo que más nos vale evitar. Sin embargo, no olviden que no en todas las situaciones se recomienda ocupar un StringBuilder. No usen un StringBuilder cuando las concatenaciones son mínimas ya que seguramente harán un pequeño pedazo de código mucho más complejo de leer además de crear un objeto adicional en memoria. Aunque para pequeñas concatenaciones StringBuilder es más eficiente, nunca lo es de forma significativa.

Si descargan el benchmark, podrán ver que la solución más rápida es la de Jordan Terrell y hace uso de una expresión regular muy potente:

string output = Regex.Replace(input, @"(\r\n)(?=\r\n)", "$1 ");

La expresión regular permite parsear el archivo en menos de 18 milisegundos! Aún 3 veces más rápido que mi última solución. Para entender la expresión regular, vamos a trabajar con un ejemplo un poco más sencillo donde remplacé ‘\r\n’ por ‘_’. Nuestro texto de entrada será:

string input = "hello world __";

Aplicamos primero una expresión regular sencilla:

string outWithoutGroups = Regex.Replace(input, @"__", "_ _");

El resultado se ve bastante bien ya que aparece un espacio entre los dos _:

hello world _ _

Sin embargo, si cambiamos un poco el texto de entrada las cosas se complican:

string input = "hello world______";

La salida es ahora la siguiente:

hello world _ __ __ _

¿Que estará pasando? Después de las palabras “hello world” aparecen 6 carateres que vamos a nombrar 1, 2, 3, 4, 5 y 6 (son los 6 underscores). Cada vez que la expresión realiza un match, i.e. encuentra dos caracteres _ consecutivos, los reemplaza por _ _. El primer match procesa los caracteres 1 y 2. Luego se sigue con el 3 y el 4 y finalmente con el 5 y el 6. El problema es que con esta expresión, no se procesaron los caracteres 3 y 5. Y es por eso que no aparecen espacios entre el 2 y el 3 y entre el 4 y el 5.

Kirill Osenkov realizó una imagen que explica claramente lo que no está procesando nuestra expresión regular:

reemplazar_underscore

Para remediar ese problema, Jordan Terrell utilizó un constructor llamado lookahead assertion que fue introducido en su tiempo por Perl 5. La aserción se representa mediante ‘?=’ y permite hacer un match de un _ seguido de otro, sin embargo el segundo _ no es parte del match y será evaluado nuevamente por su cuenta. Lo cual permite generar la salida correcta:

hello world _ _ _ _ _ _

¿Alguien ha sido confrontado con preguntas de este tipo?

March 19, 2010

No olvide especificar ImageFormat al guardar una imagen en C#

Filed under: C# — Tags: , — jdecuyper @ 5:14 pm

Ya que las imágenes son recursos generalmente pesados, siempre hay que poner mucha atención al manipularlas. Me di cuenta de lo anterior al generar thumbnails a partir de imágenes de una galería web. La primera versión de mi código era la siguiente:

Image imagenFormatoOriginal = Image.FromFile(Directory.GetCurrentDirectory() + "//test.jpg");
Image imagenFormatoThumb = imagenFormatoOriginal.GetThumbnailImage(1000, 1000, null, IntPtr.Zero);
imagenFormatoOriginal.Dispose();
imagenFormatoThumb.Save("test_thumbnail.jpg");

La imagen original pesa 1,002 KB y tiene dimensiones de 4096 por 6144 pixeles. La imagen thumbnail generada pesa 361 KB con dimensiones de 1000 por 1000. La nueva imagen representa un 36.02% del peso de la imagen original. Puede parecer un buen ratio pero en una galería de cientos de imágenes cada byte se vuelve importante tanto para el disco duro como para el ancho de banda. En el siguiente código, especificamos el formato en el cual deseamos crear el thumbnail:

Image imagenFormatoOriginal = Image.FromFile(Directory.GetCurrentDirectory() + "//test.jpg");
Image imagenFormatoThumb = imagenFormatoOriginal.GetThumbnailImage(1000, 1000, null, IntPtr.Zero);
imagenFormatoOriginal.Dispose();
imagenFormatoThumb.Save("test_thumbnail_con_formato.jpg", ImageFormat.Jpeg);

Ahora, la imagen thumbnail generada pesa 29KB y tiene dimensiones de 1000 por 1000 pixeles, lo cual representa un 0,34% del peso de la imagen original ¿Que estará suciediendo? El artículo de MSDN sobre el método ‘Save’ indica que cuando no se especifica ningún codificador, se ocupa por default el codificador del formato de gráficos de red portátiles (PNG/Portable Network Graphics). Creo que sería un mejor diseño obligar el programador a escoger siempre el formato en el cuál desea guardar su imagen o que el método ‘Save’ trate de adivinar el formato de la imagen al guardarla. Si examinamos el header de nuestros dos thumbnails generados corroboramos lo anterior. El thumbnail generado sin formato tiene un header específico del formato PNG:

Imagen en formato PNG

Imagen en formato PNG

En cambio, el header de la segunda imagen presenta las características del formato JPG, mucho más compacto y de menor calidad que PNG:

Imagen en formato JPG

Imagen en formato JPG

Older Posts »

Powered by WordPress