english

 |

flawed concepts

Suche

Archive

März 2010
Februar 2010
Januar 2010
Das Neueste ...
Älteres ...

Kommentare

florian zu Hallo 2009, tschüss Blog
Mi, 21.01.2009 22:33
Aber es hätte auch sein Gutes: Eine Plattform weniger für e klige Linkspammer.


Jan zu Hallo 2009, tschüss Blog
Mi, 21.01.2009 15:39
Ja manchmal hat man im Leben e infach so viele Sachen, denen man sich widmen möchte oder au ch muss, dass da gewisse [...]


Dirk zu Call A Bike
Mo, 10.11.2008 14:17
Ich leihe mir ungern eine Fahr rad fahre lieber nur mein eige nes. Hatte mal ein negatives E rlebenis mit einen Leihr [...]


Azundris zu
Fr, 25.07.2008 01:50
Schaust Du Wikipedia. «The first version of SQL was devel oped at IBM by Donald D. Chamb erlin and Raymond F. Boy [...]


Florian zu Bildverstehen im Supermarkt
Fr, 18.07.2008 23:28
Mein real-Markt ist im Stuttga rter Raum. Die Gemüsewaagen sind mit Kamera, nicht die Ein kaufswägen.


manuel zu Bildverstehen im Supermarkt
Fr, 18.07.2008 07:58
Also der Beitrag wurde ja scho n im April geschrieben und ich war bisher schon mehrere Male im Real einkaufen, auch [...]


joern zu Call A Bike
Mo, 14.07.2008 12:06
Sowas finde ich ne sehr umwelt freundliche Lösung. Das sollte publiker gemacht werden.


Tags

acl05 auto bahn blogs cl clustering fun gemaule hardware java klassik krol linguistics metal musik ner new york city niagaracmt oboe php reisen remember twilight saxophon search semantics server sjbo solaris sun t1 uima

Kategorien

  • XML computing
  • XML english
  • XML german
  • XML life
  • XML meta
  • XML misc
  • XML music
  • XML studies


Alle Kategorien

Blog abonnieren

XML RSS 0.91 feed
XML RSS 1.0 feed
XML RSS 2.0 feed
ATOM/XML ATOM 0.3 feed
ATOM/XML ATOM 1.0 feed
XML RSS 2.0 Kommentare

Verwaltung des Blogs

Login

Powered by

Serendipity PHP Weblog

Sonntag, 21. Juni 2009

badblocklocate: Find LVM2 logical volume containing a block

Geschrieben von florian in computing, english um 19:16
When SMART tells you about possibly defective hard disk blocks, you might want to know what volumes are affected so you can perform extra precautions or rewrite the data to the the disk to reallocate the affected blocks.

The excellent Bad block HOWTO for smartmontools tells you how to do this, but is too easy to get confused by all the different block numbers, sizes and offsets you have to calculate.

Therefore, I have written a small script that performs the calculations and determines the LVM2 volume that contains a given block number: badblocklocate.py.

Can determine LVM2 logical volume names from block numbers given on the command line, or can call smartctl to automatically determine defective blocks. Please see the comments at the top of the file for usage information.

For now, it works for LVM2 logical volumes only (though classical partitions can be determined through the error message.) If time permits, i might extend it that it also determines which file contains the defective blocks.
Kommentare (0) | Trackbacks (0)

Donnerstag, 5. Juni 2008

A Literature Survey on Domain Adaptation of Statistical Classifiers

Geschrieben von florian in english, studies um 22:36
Domain Adaptation (i.e.: you train a statistical classifier on one type of text but want to use it on a different type of text) is one of my research interests.

Recently I found a great survey of publications on domain adaptation:
A Literature Survey on Domain Adaptation of Statistical Classifiers,
by Jing Jiang, who is a PhD candidate at UIUC and has written some interesting papers on domain adaptation herself, such as
Instance weighting for domain adaptation in NLP.
Kommentare (0) | Trackbacks (0)

Dienstag, 28. August 2007

"The Privacy Problem"

Geschrieben von florian in english um 23:37
Machine Learning researcher John Langford about "The Privacy Problem" - the need to collect vast amounts of data for machine learning and data mining and the privacy issues that come with such big data collections.

(I also think it is not just the data collection that poses privacy problems, it is also machine learning and data mining techniques that enable us to gather information about people that would otherwise be hidden in the sea of data.)
Kommentare (0) | Trackbacks (0)

Donnerstag, 12. Juli 2007

Peter Norvig: Warning Signs in Experimental Design and Interpretation

Geschrieben von florian in english um 23:33
Peter Norvig: Warning Signs in Experimental Design and Interpretation

Mistakes one can make in conductung and interpretingstatistical experiments and how you can spot them.

Also comes with two very interesting graphical examples about how "humans are very good at detecting patterns, but rather poor at detecting randomness".
Kommentare (0) | Trackbacks (0)

Donnerstag, 24. Mai 2007

Links (papers) of the day

Geschrieben von florian in english, studies um 11:58
Intelligent email clients:

Dredze et. al.: Feature Design for Transfer Learning
(learning to recognize mails that need a reply)


Neustaedter et.al.: The Social Network and Relationship Finder:
Social Sorting for Email Triage


Boone: Concept Features in Re:Agent, an Intelligent Email Agent

(there's a lot more, these are just the ones I stumbled upon.)

I wonder why machine learning features don't play any role in real-world email clients apart from spam classification.
Kommentare (0) | Trackbacks (0)

Dienstag, 15. Mai 2007

Links of the day

Geschrieben von florian in english, studies um 22:40
Fernando Pereira: "Zellig Harris, natural language processing, and search"
(about the differences between general language and technical languages and their implications for NLP)

Bill Softky: "How Google translates without understanding"
(elReg article about Google's effort in Statistical Machine Translation)
Kommentare (0) | Trackbacks (0)

Donnerstag, 15. März 2007

The Theory of the Wall Street Journal

Geschrieben von florian in english, studies um 23:55
"Computational linguistics in the lat 20 years essentially has been the Theory of the Wall Street Journal"
Ron Kaplan (CTO Powerset), in a talk at IMS Stuttgart this afternoon.

"The Theory of the Wall Street Journal" vollständig lesen

Kommentare (2) | Trackbacks (0)

Donnerstag, 22. Februar 2007

PEAR::Translation2 considered bad.

Geschrieben von florian in computing, english um 23:53
The application I'm working on is of course supposed to be localized to all kinds of languages (european ones first, but with a certain sinophile coworker, anything is possible). So, I was looking at internationalization/localization support for PHP, and arrived at two contenders:

  • the gettext PHP extension (Tutorial)

  • PEAR::Translation2


"PEAR::Translation2 considered bad." vollständig lesen

Kommentare (0) | Trackbacks (0)
Tags für diesen Artikel: php
Tags für diesen Artikel: php

Mittwoch, 22. November 2006

UIMA integration for the Stanford Named Entity Recognizer

Geschrieben von florian in computing, english, studies um 16:11
The Stanford NLP Group released a Named Entity Recognition software, based on Conditional Random Fields and implemented in Java.

It is pretty fast and also acheives quite good performance with the included models.

For integration into IBM's UIMA text analysis frameword, I have written an Analysis Engine component that wraps the Stanford NE Regonizer.
You can download it here: stanford-ner-uima.zip
Just like the Recognizer itself it is licensed under the GPL.

Please let me know if it is useful for you.
Kommentare (0) | Trackbacks (0)
Tags für diesen Artikel: ner, stanford, uima
Tags für diesen Artikel: ner, stanford, uima

Mittwoch, 15. November 2006

Tech Tip: Eclipse Workspace Restorer

Geschrieben von florian in computing, english um 23:13
I wish I had known this a few hours ago... Eclipse Workspace Re-Builder Plug-in
Kommentare (0) | Trackbacks (0)

Dienstag, 14. November 2006

Tech Tip: Concatenating PDF files

Geschrieben von florian in computing, english um 15:07
Use pdftk: pdftk file1.pdf file2.pdf cat output outfile.pdf
Kommentare (0) | Trackbacks (0)

Dienstag, 26. September 2006

Information Food Chain

Geschrieben von florian in english, studies um 11:30
Etzioni (1996!): "I view the World Wide Web as an information food chain. The maze of pages and hyperlinks that comprise the Web are at the very bottom of the chain. The WebCrawlers and Alta Vistas of the world are information herbivores; they graze on Web pages and regurgitate them as searchable indices. Today, most Web users feed near the bottom of the information food chain, but the time is ripe to move up. Since 1991, we have been building information carnivores, which intelligently hunt and feast in Unix, on the Internet, and on the Web".

Nice metaphor.
(Save for the "bots" and "agents" rhethoric that follows.)

Etzioni is building "information omnivores" now.
Kommentare (0) | Trackbacks (0)

BIND slow on SLES9? Disable IPv6

Geschrieben von florian in computing, english um 11:24
If you experience slow out-of-cache responses with the stock BIND (9.2.3) nameserver on SuSE Linux Enterprise Server (SLES9), try to disable to ipv6 kernel module. This seems to be related to a similar issue in CentOS
Kommentare (3) | Trackbacks (0)

Dienstag, 15. August 2006

So called "language-independent"

Geschrieben von florian in english, studies um 14:31
Is it just me or does the term "language-independent" (Named Entity Recognition, oder whatever) seem a bit pompus? When in fact all papers I've seen on the subject instead suggest ways to train multiple single-language classifiers - provided that you've got corpora for all languages and even genres.
Kommentare (0) | Trackbacks (0)
Tags für diesen Artikel: ner
Tags für diesen Artikel: ner

Mittwoch, 21. Juni 2006

Social Bookmarking and the problems of choice...

Geschrieben von florian in computing, english um 16:54
A while ago I started managing my bookmarks (those that are suitable for the public, mind you) with del.icio.us. I really like it, and I'd like to also put it to use for the bibliography for my masters thesis, but del.icio.us has a big drawback for that: It doesn't seem to support any electronic bibliography formats like e.g. BibTeX.
Furthermore, scientific publications have some more metadata (like the author) one would like to have searchable, and not in the regular tags please (if you think it should be in the tags, please tell me why.)

There are indeed alternatives focusing on scientific publications, and - as with all great ideas and their good follow-ons - even more than one. Even though choise is great in principle, I'm not too happy about this as for social software, you want to have network effects and the spread of users over several sites hurts for that matter.
And I have to decide: Do I stay with del.icio.us and it's large user base, even though it has no support for citation software? (maybe abuse the the description field for BibTex data). Or switch to a more scientifically oriented service? And which to chosse? I know of Connotea, BibSonomy and CiteULike. Any recommendations?
Kommentare (0) | Trackbacks (0)
(Seite 1 von 2, insgesamt 17 Einträge) » nächste Seite
 
Powered by Serendipity | Template by Perun